johnoneil / arib

Japan Association of Radio Industries and Businesses (ARIB) MPEG2 Transport Stream Closed Caption Decoding Tools
Apache License 2.0
57 stars 9 forks source link

Missing Gaiji #6

Closed johnoneil closed 10 years ago

johnoneil commented 10 years ago

Gaiji implementation (characters outside normal shift-jis character set) is not in place. This causes symbols like ♫ and ➔ to be replaced with inappropriate kanji. For example, the following .ass dialog line: Dialogue: 0,0:01:7.43,0:01:11.60,normal,,0000,0000,0000,,{\rnormal}{\c&Hffffff&}{\pos(170,509)}頰〜\N

Should read: Dialogue: 0,0:01:7.43,0:01:11.60,normal,,0000,0000,0000,,{\rnormal}{\c&Hffffff&}{\pos(170,509)}♫〜\N

Also the following: Dialogue: 0,0:05:14.33,0:05:18.95,normal,,0000,0000,0000,,{\rnormal}{\c&Hffffff&}{\pos(170,509)}上あくれよ{\rmedium}{\c&Hffffff&}っ{\rnormal}{\c&Hffffff&}て意味で言{\rmedium}{\c&Hffffff&}っ{\rnormal}{\c&Hffffff&}くゃだよ{\rmedium}{\c&Hffffff&}り\N

should not have at the end. It should prob be: Dialogue: 0,0:05:14.33,0:05:18.95,normal,,0000,0000,0000,,{\rnormal}{\c&Hffffff&}{\pos(170,509)}上あくれよ{\rmedium}{\c&Hffffff&}っ{\rnormal}{\c&Hffffff&}て意味で言{\rmedium}{\c&Hffffff&}っ{\rnormal}{\c&Hffffff&}くゃだよ{\rmedium}{\c&Hffffff&}➔\N

johnoneil commented 10 years ago

musical not character is encoded as: 0xfd 0xfa-->頰 in two byte shift-jis. in binary this is 0b11111101 0b11111010 However the 8th bit is set, which puts it outside the normal kanji coding table. If we block that bit it becomes: 0b01111101 0b01111010 -->which we can look up in the Arib B-24 standard as "♫". The correct character.