Open matthew-e-brown opened 4 years ago
Was watching some Breath of the Wild speedruns and remembered randomly that apparently Nintendo has used .msbt
files for their dialogue for a long time. So, with a little bit more specific Googling than I had been doing before...
I found this page! It mentions "Text Commands" with some details about a few of them. This should give me enough starter information to start using footage of Blathers talking to piece together what each 00 0E
command does.
Thanks, smallant1!
Lots of development today. All that's left to do now is figure out what each one of these commands do. These discoveries will be documented in commits to Notes.md
.
Some more things I've found out as I've been researching this for the Pokémon HOME's msbt files (it might be different from AC but I guess most of it could be related).
The main structure is 4 shorts: 0E 00
, Command type, Command variant, number of subsequent bytes.
As you wrote, Command type 00 00
are text modifiers. Command variant 03 00
is color change, followed by 04 00
and 4 bytes (RGBA). Command variant 02 00
seems to be a font change. I have one occurence of this in a chinese text just before the word "Nintendo" (in latin characters).
Command types 01 00
and 02 00
are variables. I guess the variant tells which one. It's followed by 02 00
and 2 bytes. In Pokémon HOME, for command type 01 00
, the first byte is 00
or 01
and the second one CD
. For command type 02 00
, first byte is in 00
, 01
, 02
, 03
, 04
while the second one is in 00
, 01
, 02
, 03
, 05
, CD
.
Command types 13 00
to 19 00
seem to be language dependant. 13 00
for English, 14 00
for French, 15 00
for Italian, 16 00
for German, 17 00
for Spanish and 19 00
for Korean. When command variant is 01 00
, it is a singular/plural switch. Variant code is followed by the usual subsequent bytes count, then 00 CD
, then 2 UTF16 strings, each of them starting with a byte count.
Command types 32 00
and 33 00
are special characters. The command variant being some index. Followed by 00 00
as subsequent bytes count.
In the ATR1
table, I could find these character codes:
0E00 3200 0200 0000
[Character1:male ]
0E00 3200 0300 0000
[Character1:female ]
0E00 3300 0200 0000
[Character2:L_DoubleQuot. ]
0E00 3300 0300 0000
[Character2:R_DoubleQuot. ]
0E00 3300 0600 0000
[Character2:StraightSingleQuot. ]
0E00 3300 0700 0000
[Character2:StraightDoubleQuot. ]
0E00 3300 0800 0000
[Character2:HalfSpace ]
0E00 3300 0900 0000
[Character2:QuarterSpace ]
0E00 3300 1200 0000
[Character2:null ]
I think ms_tags.h (included in some titles, e.g. 3ds system ones) tells you what the bytes mean - although they can vary
#define MSTAGGROUP_System 0x0
#define MSTAGGROUP_CTR_built_in 0x1
// tags in group "System"
#define MSTAG_System_Ruby 0x0
#define MSTAG_System_Font 0x1
#define MSTAG_System_Size 0x2
#define MSTAG_System_Color 0x3
#define MSTAG_System_PageBreak 0x4
The first byte after 0x0e is the tag group byte, e.g. 0x00 which is System. The next byte after that is the command itself. Font color (0x03) and font size (0x02) match up with what @edcrfv0 said above.
I'm fairly certain Ruby is referring to this, but I've yet to see it being used in the wild so I'm not sure what bytes it expects. Best bet is to source a ton of Japanese Nintendo games using MSBT and hope one of them uses it. I've yet to see PageBreak be used.
Thanks @edcrfv0 and @shanepm. When I eventually get around to working on AC stuff again I'll be sure to take a deeper look at this and add it to Notes.md
. What you've found looks very promising and interesting.
I'm very happy to see a repository of mine actually see some use! ...I've just been a bit busy working on other projects and school. It doesn't go unnoticed, though. 😁
Some more notes:
It definitely seems like 0x00 means ruby, and 0x04 looks like page break (0e00000004000000
).
A few examples of 0x00 below (I added in the hyphens/2d00 to make it easier to separate them)
Note: These contain more bytes than the commands, I haven't trimmed the extra.
0e0000000000080002000400613044300f5c553044306a304c308930003068306b304b304f30
2d002d002d002d002d002d00
0e00000000000600020002005b30cc8073308c3001ff
2d002d002d002d002d002d00
0e00000000000a0004000600823088304630216ad8696e30
2d002d002d002d002d002d00
0e00000000000a00020006004b3089306030534f01ff
2d002d002d002d002d002d00
0e000000000008000200040044308d3072826e30d230ec3001ff
2d002d002d002d002d002d00
0e00000000000a00040006004b306e304630ef53fd806a30
2d002d002d002d002d002d00
0e00000000000800020004004b304e3050968a30
2d002d002d002d002d002d00
0e00000000000a000400060057305c309330ea8136716a30
2d002d002d002d002d002d00
0e00000000001600080012005b3044305f3044304b3093304d30873046301f754b61b07483586e30823068306730
2d002d002d002d002d002d00
0e00000000000a000200060059304c305f30ff599230
2d002d002d002d002d002d00
0e00000000000c00040008004b304f306b309330ba788d8a57305f3044306e30673059304c30
2d002d002d002d002d002d00
0e00000000000a0002000600573085309330ec6563306630
2d002d002d002d002d002d00
0e00000000000a00040006007e30753086301f77ac516a3093306730593088306d30
2d002d002d002d002d002d00
0e000000000008000200040055308030d25b44306e306f306130873063306830
Converted to hex, that's:
......ちい小さいながら とにかく------......せ背びれ!------...
..もよう模様の------...
..からだ体!------......いろ色のヒレ!------...
..かのう可能な------......かぎ限り------...
..しぜん自然な------......せいたいかんきょう生態環境のもとで------...
..すがた姿を------......かくにん確認したいのですが------...
..しゅん旬って------...
..まふゆ真冬なんですよね------......さむ寒いのはちょっと
Paste that onto Furigana Maker and you get this - note how the kana on the left matches with above the kanji.
Looking at those bytes I see this command syntax:
kanji_len_u8
kana_len_u8
Looks like you're right. Great find! Although, it also looks like that Furigana Converter is outputting some funky things. I think that's just an error with the site, though—it's repeating things.
If you notice, ちい小さいながら should be 小さいながら — いろ色のひれ! should be 色のひれ! — さむ寒いのはちょっと should be 寒いのはちょっと — etc... Handy I've been learning Japanese...
That's because I sent ちい小さいながら to the site - the raw text from the command plus some extra, so the site was right. The very first command is this:
0e00 - Marker
0000 - System
0000 - Ruby
0800 - 8 utf8 bytes from 0200 to 4430 below
0200 - kanji is 2 utf8 bytes
0400 - kana is 4 utf8 bytes
61304430 - ちい - kana
0f5c - 小 - kanji
Oh, that's very interesting... You'd think they'd just have to store ちい and 小 together, since the rest is just displayed as written...
You said you'd yet to see one of these appear "in the wild" in your original comment: is this example pulled from ACNH? If not, I'll see if I can find one in the Japanese MSBT's from it.
When I originally commented I'd only looked at 3ds system MSBTs. Those later examples are from ACNH :slightly_smiling_face:
Hi, it's me again (alt account). The wiki here was updated some more.
It turns out the control sequences all follow the same format, so you can convert them to readable format: Examples: SP_owl_Comment_Insect.po, SP_ItemName_30_Insect.po , SYS_Get_Fish.po
Normally Nintendo formats them like [System::Color name="red" ]
or in bytes e.g. [00:03 bytes="0100" ]
or similar, so maybe reading those files will help understand what the sequences should be named / what they do?
Thanks, your issue helps me a lot!
Most, if not all, of the messages have what I believe are many escape sequences in them. Most of them start with
00 0E
bytes;\u000e
when decoded to UTF-16-LE. While many of them seem to be of a common length, not quite all of them share any similarities.For example, here is how Blathers's comments on the Goldfish are stored, once exported to JSON:
These escape sequences are likely triggers for
\u0004Ā촃
(bytes00 04
01 00
CD 03
) beforeGoldfish
? That's probably to highlight "Goldfish" with green or blue.I am considering trying to decompile the game's binaries and trying to find where the game reads these files. Perhaps that will give some insight into what each of these sequences is doing? Once I figure out how to parse these escape sequences, it will become possible to automatically reformat the text in all languages, instead of manually going through the langauges I know and fixing them.'
It isn't as simple as just cutting out all the characters that need to be escaped in JSON, since a lot of the escape sequences have regular characters like
(
andĀ촃
in them.