sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.62k stars 2.13k forks source link

HTML generated index: glossary categories unexpectedly appear within the letter-names list. #12707

Open jayaddison opened 4 months ago

jayaddison commented 4 months ago

Describe the bug

The Sphinx glossary directive has a feature that allows for entries to have 'grouping keys' -- essentially a way to declare that a definition within the glossary belongs to a parent category.

I think this is how the glossary of fermions in our test-glossary testroot should be organised -- multiple distinct types of fermion are defined there, and so each of them could be described differently and also be placed into a common category, such as fermions.

Here is a minimal example derived from that:

test-glossary
=============

.. glossary::
   :sorted:

   boson
      Particle with integer spin.

   *fermion*
      Particle with half-integer spin.

   tauon : fermions
      A half-spin elementary particle with an electric charge of -1.

   myon : fermions
      A negatively-charged elementary particle with a 1/2-spin; heavier than
      an electron, but not as heavy as a tauon.

   electron : fermions
      A negatively-charged elementary particle, fundamentally important in
      the field of electricity.

   über
      Gewisse

   ähnlich
      Dinge

When we use the html builder from this input, the actual bug occurs: the output genindex.html contains category names in full within the list of single-character alphabet headings -- A | B | F | fermions | U instead of the expected A | B | F | U.

image

What I'd expect is for only the first letter of the category name (F for fermions in the example case above) to be displayed and used for grouping of the entries.

How to Reproduce

conf.py

# empty

index.rst

test-glossary
=============

.. glossary::
   :sorted:

   boson
      Particle with integer spin.

   *fermion*
      Particle with half-integer spin.

   tauon : fermions
      A half-spin elementary particle with an electric charge of -1.

   myon : fermions
      A negatively-charged elementary particle with a 1/2-spin; heavier than
      an electron, but not as heavy as a tauon.

   electron : fermions
      A negatively-charged elementary particle, fundamentally important in
      the field of electricity.

   über
      Gewisse

   ähnlich
      Dinge

Environment Information

Platform:              linux; (Linux-6.9.10-arm64-aarch64-with-glibc2.39)
Python version:        3.12.4 (main, Jul 15 2024, 12:17:32) [GCC 13.3.0])
Python implementation: CPython
Sphinx version:        8.0.0+/b0485f932
Docutils version:      0.21.2
Jinja2 version:        3.1.4
Pygments version:      2.18.0

Sphinx extensions

N/A

Additional context

Discovered during development of #12699.

electric-coder commented 4 months ago

What I'd expect is for only the first letter of the category name (F for fermions in the example case above) to be displayed and used for grouping of the entries.

Conceivably you wouldn't want to limit the user to single letter alphabetical groupings. I always found the groupings feature to be limited and not sufficiently flexible both with the glossary and index directives.

It would make sense to allow for every entry to be grouped by a custom name and even dispensing the default single letters entirely (there's no configuration option for that). Which raises another issue: should the directive include entries in both F and fermions? Or not include in F if an entry was included in fermions?

What's the use of fermions if everything gets grouped into F anyway?

jayaddison commented 4 months ago

What's the use of fermions if everything gets grouped into F anyway?

In HTML output, I'd expect a glossary entry with a classification/grouping (such as electron : fermions) to be placed into a container with other definitions that share the same classification:

image

Or, approximated as an ASCII-art diagram:

A
  ähnlich
B
  boson
F
  fermion
  fermions
    electron
    myon
    tauon
U
  über
electric-coder commented 3 months ago

As shown this has a couple of strange drawbacks:

  1. It uses the singular/plural to distinguish the regular glossary entry from the grouping - there might not always be a convenient singular/plural that naturally puts the group in a lexicographically desirable place, as number contrast might not be defined by a sufix contrast (see ntša/dintša/mantša).

  2. I don't think it's possible to create a cross-reference directly to the grouping if you want it - that defeats usefulness of the grouping feature since you can't mention it with a link elsewhere.

  3. The glossary is limited in functionality by defaulting to letter links instead of allowing thematic hypernymy groupings on top.

But yes, if you want to limited the glossary to only allowing letter links on top (enforcing uniformity) than this change does contribute to making the directive more consistent - a feature some users might find desirable.

jayaddison commented 3 months ago

As shown this has a couple of strange drawbacks:

  1. It uses the singular/plural to distinguish the regular glossary entry from the grouping - there might not always be a convenient singular/plural that naturally puts the group in a lexicographically desirable place, as number contrast might not be defined by a sufix contrast (see ntša/dintša/mantša).

  2. I don't think it's possible to create a cross-reference directly to the grouping if you want it - that defeats usefulness of the grouping feature since you can't mention it with a link elsewhere.

Perhaps situations where a defined term is also used as a grouping (classification) term could be handled by consolidating them both into a single entry in the index, containing child entries for the members of the group?

index.rst

test-glossary
=============

.. glossary::
   :sorted:

   boson
      Particle with integer spin.

   *fermion*
      Particle with half-integer spin.

   tauon : fermion
      A half-spin elementary particle with an electric charge of -1.

   myon : fermion
      A negatively-charged elementary particle with a 1/2-spin; heavier than
      an electron, but not as heavy as a tauon.

   electron : fermion
      A negatively-charged elementary particle, fundamentally important in
      the field of electricity.

   über
      Gewisse

   ähnlich
      Dinge
A
  ähnlich
B
  boson
F
  fermion
    electron
    myon
    tauon
U
  über
electric-coder commented 3 months ago

I haven't forgotten about this post, but in the meanwhile I haven't found the time to test the glossary and think about what would be ideal here.

jayaddison commented 2 days ago

@electric-coder FWIW: I am worried about potential regressions in Japanese and other non-Latin charactersets based on the work-in-progress implementation in #12862, so I've asked for some confirmation/assistance there.

electric-coder commented 1 day ago

@jayaddison the problem of combining characters from UTF-8 to UTF-16 is a difficult problem, so it's better to be careful to not break things. In any case, the critique that was developed here leaves a breadcrumb for future work.