whatwg / encoding

Encoding Standard
https://encoding.spec.whatwg.org/
Other
273 stars 77 forks source link

Adding BRF as "legacy" single-byte encoding for braille #40

Closed sthibaul closed 8 years ago

sthibaul commented 8 years ago

Hello,

BRF is a charset that permits to encode braille. http://brl.thefreecat.org/test.php is an example of text file encoded in the BRF charset, holding 3 braille patterns. BRF is the standard way of providing documents ready for braille embossing, all official documents available on the web ready for embossing are using it (books, courses, tax forms, income declaration, etc.), as required per section 508 in the US for instance. There is currently no other really-used standard way of shipping them using UTF-8 (the PEF format is still at very early stage), you will never see BRF documents encoded in utf-8.

For now, browsers ignore the "charset=brf" content-type qualifier, and show the file as if it was ascii, i.e. they print "A B C". They should recognize the BRF charset like other charsets (and for instance use iconv for converting it to unicode, and then display it just like any unicode text file), and thus print "⠁⠀⠃⠀⠉" instead of "A B C".

The BRF format defines bytes 0x00-0x1F as 1-to-1 equivalents to ascii 0x00-0x1F, and 0x20-0x5f as equivalents of 6-dot braille patterns of U+2800-U+283f.

BRF got added to IANA's list of charsets in 2006, see http://www.iana.org/assignments/charset-reg/BRF

BRF got added to glibc's iconv around the same period, see https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=localedata/charmaps/BRF;hb=HEAD

Regards, Samuel

annevk commented 8 years ago

So if it maps to Unicode, why would you not use Unicode? Adding new ASCII-incompatible encodings is a huge security risk, so I don't think we should do that.

vyv03354 commented 8 years ago

Late April fool like #39?

sthibaul commented 8 years ago

It's no April fool. BRF files are shipped everyday by government agencies etc. in BRF encoding, and never in unicode encoding.

annevk commented 8 years ago

@sthibaul how are these resources published to the web? Surely they are converted at that point?

Note that we intentionally do not want to rely on IANA. We don't want to support all encodings. A lot of them have security vulnerabilities and it's much better overall for folks to converge on utf-8.

If a legacy encoding is not supported by any browser, there's no reason to add one now, especially since there is a Unicode mapping available.

There are many formats not directly compatible with the web, but the web has sufficient tooling these days to build interpreters for them. It doesn't need native support for all these legacy formats.

Closing this based on the above. I hope you can appreciate why we don't want to do this.

sthibaul commented 8 years ago

They are not converted, the files are published as such, in BRF encoding. Them showing up as ASCII letters in the browser is just incorrect, they should really show up as braille patterns.

annevk commented 8 years ago

How is that useful?

z80pio commented 5 years ago

Does this encodings have an charset alias(for java)? or anyone has a sample email with this brf encoding?

sthibaul commented 5 years ago

I don't think there is any charset alias, it'd just be called BRF. Mails aren't sent in that encoding (braille-only mails are really not useful). Text documents are.

bootmii commented 8 months ago

Braille ASCII (also known as BRF or SimBraille) also maps 0x60-0x7e to (U+28 and leading 0 not shown) 8, 1, 3, 9, 19, 11, B, 1B, 13, A, 1A, 5, 7, D, 1D, 15, F, 1F, 17, E, 1E, 25, 27, 3A, 2D, 3D, 35, 2A, 33, 3B, and 18, just like 0x40-0x5e.