Several Japanese Characters Missing

Roadcrosser commented 2 years ago

Some Katakana characters are missing from the texture, despite the Hiragana counterparts being present.

Notably, these characters: ブプバパベペボポダデズガゼヂドジギグゲヅザゾビピゴァィゥェォャュョッ

Additionally, it appears that the small つ (っ) is also missing.

I'm wondering if it is a limitation of the 16x16 character grid that these are missing.

Lhun commented 2 years ago

All common Kanji is missing too, which is a pretty big problem. :) Recently, for our event with Virtual Market, I created a full japanese character map for TMP that includes the most common kanji. Unity_TMP_Japanese-main.zip

Here's pretty much every character you'll need for en->jp

Unfortunately it's really annoying to extract a TMP atlas, which is extraordinarily frustrating because the image file is literally right there, but I digress.

I hope this helps!

killfrenzy96 commented 2 years ago

I will need to create a new template of Japanese text at some point. At this time, it don't properly support Japanese because I can't actually read Japanese. There is still some additional space that can be used on the font atlas for extra characters so I might need to see how much more text I can fit.

Roadcrosser commented 2 years ago

The missing っ is currently the biggest issue.

Though the missing katakana characters would be nice to have as well.

After that is mostly the Kanji which might be an issue to implement with how many there are.

The commenter above did make TMP asset containing commonly-used Kanji so that might work out.

Also slightly relevant, but it would be convenient if we could use a config on the Desktop application to change the character mapping, especially if one were to create custom fonts (like I am doing) or mappings, though this might warrant its own issue/PR in that repo.

s-ilent commented 2 years ago

Kanji is a problem because it will take a lot of extra space on the avatar to fit each character. The current system can fit a font with sharp high-res characters into under a megabyte, but to fit kanji would require something like 60 times that. The archive Lhun posted is 25mb, with the actual font asset inside being 131mb. This is terrible for something that needs to be included with every avatar. There are people who set their download size limit to less than 30mb! And it takes time to download and load bigger avatars. Unless the font is squished really tight, it seems like it would be a really heavy thing to add.

Another problem is accessing it. While the shader can take whatever number as a parameter, the animator parameters are limited to the space already available; it seems like a second parameter would need to be added for each character, halving the transfer speed. At least having double the parameters makes that less of a problem.

I think kanji should probably be treated as a seperate issue...

However, the limitation of the current addressing method is a problem even for the katakana characters missing. I don't think there's enough space in the character set to fit them. There's currently 24 open spaces, but the number of characters needed is more like 34. The small characters can be fit in, but the rest have no room.

It's possible to have the OSC input tool split characters like ズ into ス〝. However, that's a pretty hacky solution and I don't know if that's very readable. Even retro games conceded defeat and used 2 bytes to store most of their text, so it might be worth thinking about, especially as you can fit extended Latin characters into the extra space as well.

Roadcrosser commented 2 years ago

I count 32 blank spaces, and if we use the last two ゛゜ we have 34 slots, which would fit all the chracters save one.

The last slot could be taken from one of the lesser used symbols, although at this point it may be a good idea to add customizable (or at least different) character sets.

Am I to assume there are only 256 different characters available without affecting the sync params?

s-ilent commented 2 years ago

Yeah, that's right. The sync parameters can only hold values with that much precision (256 entries), so you can't store any more values in one.

Lhun commented 2 years ago

I mean, it can be done. Here's the character map from a bunch of sprite limited games like fire emblem and castlevania.

Finally, I've generated a file using noto sans that includes 1453 of the most common characters. The text file attached is the vast majority of japanese used today (the core 2700 or so). Ponponsan shared the text file with me earlier .

message (2).txt I suspect some other method will be needed, like using cardinal directions from OSC to select navigation on a grid of characters and looping around, or loading multiple files on demand.

I'm very slowly processing a file that i'll add to this post once it's compressed. In truth, the CJK Unicode range is over 20000 characters. But only a subset of around 2000 characters are necessary to display Japanese text properly. These characters are spread around in ranges from U+4E00 to U+9FA5.

This tool can do the job,https://www.angelcode.com/products/bmfont/ You might need to pack it into RGBA(which bmpfontgen can do. I've attached an export of that called "chonky_0.png". It's at around 24pt.

If you want an already made solution, I've also attached LANAPIXEL, which is a super tight pixel style font that can be found here: https://opengameart.org/content/lanapixel-localization-friendly-pixel-font

The PNG has just about every character but it's probably better to generate from the nokorean one. That being said, we have a sizable south korean population in game now with the popularity of a certain girl band.

lanapixel_png.zip lanapixel_nokorean.zip

chonky_0

killfrenzy96 commented 2 years ago

At this time, each character can only handle 255 different values. To support more than this, it will require double the parameter space. I think I can make it work without adding too much animator complexity. It might be best to do this in a separate branch or fork of this project.

Roadcrosser commented 2 years ago

I did just get an idea for how you would add more characters with minimal increase in parameter space. Unfortunately I don't know enough about OSC/shaders to actually know how feasible this actually is:

A spritesheet is split up into sections, where each section has 255 characters. (This could also be implemented with multiple spritesheets)
When writing characters, a single parameter will indicate which section to grab a character from for the pointer. If the current range uses characters from different sections, the writer would need to do multiple passes to write characters from different sections, adding some delay. (Though this wouldn't be too much of an issue as this would likely be a rare occurrence save for heavy kanji usage).
- This would require dedicating a specific value of the char parameter to "do not change" so it isn't overwritten on the next pass, which may reduce characters per section by one.
- The shader property count will definitely double, though, to indicate which section a specific character is from, in addition to the character position.
- Given that updating OSC values aren't atomic, the animator may end up assigning characters to the wrong section before we manage to change it. Maybe a lock parameter would need to be used. This issue seems like it would already be present with the current pointer system, so maybe it's already been addressed in a different way, though.

Again this is just how I think it can be done, and I definitely don't know enough about this to know if it's actually feasible.

killfrenzy96 / KillFrenzyAvatarText

Several Japanese Characters Missing #6