Deduplicate index entries

ietf-tools / xml2rfc

Generate RFCs and IETF drafts from document source in XML according to the IETF xml2rfc v2 and v3 vocabularies

https://ietf-tools.github.io/xml2rfc/

BSD 3-Clause "New" or "Revised" License

64 stars 38 forks source link

Deduplicate index entries #988

Closed martinthomson closed 10 months ago

martinthomson commented 1 year ago

Describe the issue

Items can be entered into the index multiple times from the same paragraph. xml2rfc happily creates multiple links to the same place.

It might be useful to remove such duplicates. This might be done along with a future iteration of #862, where entries might be arranged more compactly at the same time.

Code of Conduct

[X] I agree to follow the IETF's Code of Conduct

jennifer-richards commented 10 months ago

Note that #1050 is the low-hanging part. It does deduplication but does not address compact grouping of entries.

reschke commented 9 months ago

FWIW: xml2rfc apparently currently links to the paragraph anchor. That may be sub-optimal in long paragraphs, where it would make much more sense to link to the actual part of the text containing the \<iref>. AFAIU, this is a result of index generation implemented late, and differs from how it is done in other implementations such as rfcxml.xslt, where each occurence of \<iref> actually generates an invisible (sic) link target in the text.

Maybe the overall way how the index is generated should be reviewed first.

martinthomson commented 9 months ago

On balance, I'm OK with the index referring to paragraphs. Though I can see how you might want to build links to the specific instance. (If you do that, then the target of the link needs something to aim at. Right now <iref> is often an empty node, but indexes typically refer to a span of text.)

reschke commented 9 months ago

Yes, spans would be good, but that would require new source markup.

jrlevine commented 9 months ago

one could do iref spans but paper book indexes just point to the page so I think it'd be overkill

reschke commented 9 months ago

FWIW, with rfcxmlt.xslt and a proper HTML-to-PDF engine (such as PrinceXML), you''l get page numbers (even ranges) in the PDF.