w3c / publ-a11y

Accessibility related discussions of the Publishing@W3C Groups
Other
22 stars 5 forks source link

Character Limit in accessibilitySummary #92

Closed GeorgeKerscher closed 2 years ago

GeorgeKerscher commented 2 years ago

In developing guidelines for accessibilitySummary content, we need to know what the character limit is. We believe that in Schema.org, there is no limit. However, it may be that in ONIX and in MARC there are character limits. We need to confirm this with the various authorities. When somebody can verify the character count, please add a comment to this issue.

There are two other considerations associated with this character count issue. First, when the accessibilitySummary is in CJK, we believe that the character count will be larger, because CJK takes three bytes, where many other languages use two bytes. We need to know the character count and the byte length limitations for the various standards.

Finally, how much a person is likely to read will impact our guidance. It may be that many accessibility Summary examples are short, and people come to expect that. It may also be the case that implementations of the presentation of accessibilitySummary may only display a fixed limit of characters.

The above comment is 1,06 characters, including spaces.

murata2makoto commented 2 years ago

I emailed Graham Bell of EDItEUR, of which Keio University is a member. It appears that grapheme clusters as defined in Unicode® Standard Annex #29, available at https://unicode.org/reports/tr29/, are what "character" means in ONIX 3.

hongcui-lac commented 2 years ago

In MARC21 bibliographic record, the record length is defined in the Leader position 00-04, 5 characters, and the maximum length of a record is 99999 octets. The length of the field is defined in the Directory position 03-06, 4 characters, thus a field may contain a maximum of 9999 octets. In general, it should be enough to record the data.

MARC21 bibliographic records can be expressed in either MARC-8 character repertoire, or Unicode. LC maintains the mapping document between MARC-8 to Unicode. More libraries have adopted Unicode, and many library systems provide the option to accept and output records in MARC-8 and/or Unicode. For languages with diacritics, there could be system preference on whether to accept and/or convert the inputted pre-composed diacritics to decomposed diacritics (i.e., convert é, U+00E9, to é, U+0301).

For languages in multiple scripts, MARC21 offered two options: Model A: Vernacular and transliteration: transcribe the script as found on the publication, and provide the corresponding transliteration data Model B: Simple multiscript records: either transcribe the script only, or provide the transliteration data only

But again, library systems may choose to prefer either Model A or Model B.