Game metadata should be in the original language.

trap15 commented 8 years ago

Using romanizations for game metadata is fairly lossy and at the very least it's poor documentation. I propose there should be a secondary title string for the original language, keeping the romanized version.

Ideally, the developer could also use UTF-8 to write it in, instead of using escaped unicode. For this, srcclean needs to stop horribly destroying unicode for no reason.

felipesanches commented 8 years ago

Some proposals:

This works, but it terribly unreadable.

GAME( 1992, wwmarine, 0, segac2, wwmarine, segac2_state, bloxeedc, ROT0, "Sega", "Waku Waku Marine \u308f\u304f\u308f\u304f\u30de\u30ea\u30f3", 0 )
GAME( 1991, soniccar, 0, segac2, soniccar, segac2_state, bloxeedc, ROT0, "Sega", "Waku Waku Sonic Patrol Car \u308f\u304f\u308f\u304f\u30bd\u30cb\u30c3\u30af\u30d1\u30c8\u30ab\u30fc", 0 )

This seems much better!

GAME( 1992, wwmarine, 0, segac2, wwmarine, segac2_state, bloxeedc, ROT0, "Sega", "Waku Waku Marine わくわくマリン", 0 )
GAME( 1991, soniccar, 0, segac2, soniccar, segac2_state, bloxeedc, ROT0, "Sega", "Waku Waku Sonic Patrol Car わくわくソニックパトカー", 0 )

trap15 commented 8 years ago

I'd like to put it in two different fields, so something like:

GAME( 1992, wwmarine, 0, segac2, wwmarine, segac2_state, bloxeedc, ROT0, "Sega", "Waku Waku Marine", "わくわくマリン", 0 )

Or for fields where they're the same, maybe something like

GAME( 1992, wwmarine, 0, segac2, wwmarine, segac2_state, bloxeedc, ROT0, "Sega", "Waku Waku Marine", "", 0 )

felipesanches commented 8 years ago

What about in the case of accented latin text?

Could it be like this?

COMP( 1972, patinho,  0,        0,      patinho_feio,  patinho_feio, patinho_feio_state, patinho_feio, "Escola Politécnica - Universidade de São Paulo", "", "Patinho Feio", "", MACHINE_NO_SOUND_HW | MACHINE_NOT_WORKING)

Bear in mind that both the "company" and the "full name" fields may have unicode characters... We may have to use some macros to make it nicer and cleaner. I can understand the idea of using null strings but I'd preffer to not have them visible in the source at all.

angelosa commented 8 years ago

As things stands, I'm prone to think that the current MAME categorization system isn't in any way adeguate for 2016 standards. Random examples:

There's no way to tell that sf2j is a Japanese version and sf2u is US other than the description tag. Might be a decent option for front-ends: "show European released games only" for example.
The parenthesis field can be automated, instead of having "Set 4", "Hong Kong bootleg" or "931101".
How many macros do we have for building systems anyway? Adding yet another optional field just for localized name and you'll get an even fancier macro-hell-scheme, which makes managing even harder.

Bottom line: I don't like software list / XML system either for personal tastes (namely being unreadable by human eye in raw format), but it certainly treats these "optional" things like non-romaji alphabets just well.

felipesanches commented 8 years ago

@angelosa Could you please open a new issue on GitHub specifically about the broader topic of improving the way MAME stores metadata in general? I think you've got valid points and I would add some more comments on that, but I'd preffer to keep this issue focused on the unicode strings and have all other discussion going on in a separate issue, for the sake of clarity and better organization of the current issues at hand.

trap15 commented 8 years ago

Could use compound literals?

GAME_ADD((GameInfo){
  .name="わくわくマリン"
  .name_romanized="Waku Waku Marine",
  .year=1992,
  [etc...]
})

In which case defaults end up being 0/NULL. More readable than XML, and keeps it in the driver.

felipesanches commented 8 years ago

The file src/mame/drivers/cps2.cpp has got almost 300 GAME entries... It would be good to keep all metadata in a single line if possible. But sometimes, indeed the lines get really long!

felipesanches commented 8 years ago

A tool called pyftsubset in the fonttools project (https://github.com/behdad/fonttools/) may be able to generate the needed Noto font subsetting that I suggested on IRC earlier today for unicode metadata strings in MAME.

Noto is a libre font family being developed by Google to have a very wide glyph coverage. So it is essentially a font designed to fulfill needs of ambitious multi-language projects like this.

But the problem is that such a font family has very large file sizes. So the idea is that we should generate a minimal font subset that contains only the glyphs needed. Before packaging a new MAME release, we would have to run an automatic subsetting script that would list all unicode codepoints of glyphs used in metadata strings declared in MAME's codebase and then the generated minimal font file would be added as a program resource and loaded by default in the MAME ui.

This would guarantee that all metadata would be properly rendered in our user interface.

felipesanches commented 8 years ago

oh! And by the way... here's the Noto libre font project website: https://www.google.com/get/noto/

mamedev / mame

Game metadata should be in the original language. #586