pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification
https://pdf-issues.pdfa.org/
66 stars 2 forks source link

inconsistent descriptions about use of Alt and ActualText and other in the stream #483

Closed u-fischer closed 1 month ago

u-fischer commented 1 month ago

In ISO 32000-2:2020 the note below Table 363 — Property list entries for artifacts on page 748 says

Some properties defined elsewhere may also be used as entries in the property list of an artifact, including Alt (see 14.9.3, "Alternate descriptions"), ActualText (see 14.9.4, "Replacement text"), E (see 14.9.5, "Expansion of abbreviations and acronyms") or Lang (see 14.9.2, "Natural language specification").

but in (for example) 14.9.4 artifact is not mentioned, only the Span tag:

Replacement text may be specified for the following items: • A structure element (see 14.7.2, "Structure hierarchy"), by means of the optional ActualText entry (PDF 1.4) of the structure element dictionary. • (PDF 1.5) A marked-content sequence (see 14.6, "Marked content"), through an ActualText entry in a property list attached to the marked-content sequence with a Span tag.

petervwyatt commented 1 month ago

Your quote from below Table 363 also says "... or Lang" - I assume this does NOT mean that only one of those properties may be present! It would be better to rephrase with "and".

Technically speaking, property lists can contain anything (or nothing) as no entries are mandated for the /Artifact MC tag. The current text is really just suggesting (in a permissive style) some other properties that might make sense for certain MC artifacts. So I think your request reduces down to annotating those other sections with matching informative text along the lines of "property X may also be useful when used in an artifact property list (see 14.8.2.2.2)" so there is some cross-referencing of sections. But that doesn't mean some properties cannot be used...

u-fischer commented 1 month ago

So I think your request reduces down to annotating those other sections with matching informative text along the lines of "property X may also be useful when used in an artifact property list (see 14.8.2.2.2)"

Yes, the request is about cross referencing. The list in 14.9.4 sounds quite final and I found it surprising that ActualText popped up in the note below table 363 too.

(Unrelated but I have no clear idea why it should be useful to use that in an artifact property list. When would you prefer /Artifact <</ActualText(xx)>> BDC over /Span <</ActualText(xx)>> BDC? Are any processor/examples known that show some sensible difference?)

petervwyatt commented 1 month ago

The list in 14.9.4 sounds quite final and I found it surprising that ActualText popped up in the note below table 363 too.

I'm guessing you mean that 14.9.4, 2nd bullet only mentions the Span tag and not Artifact or any other MC tag as a valid tag that can have ActualText?

I would agree that the current wording is overly final: "through an ActualText entry in a property list attached to the marked-content sequence with a Span tag".

This would be better stated as "through an ActualText entry in a property list attached to the marked-content sequence, such as with a Span or Artifact tag."

DuffJohnson commented 1 month ago

I concur with you suggestion, @petervwyatt.

petervwyatt commented 1 month ago

New proposal from PDF TWG: add a new 3rd bullet in 14.9.4:

petervwyatt commented 1 month ago

Confirmed that this was documented in Adobe PDF 1.5