veekun / pokedex

more than you ever wanted to know about Pokémon
MIT License
1.44k stars 637 forks source link

Which text fields are from official sources? #260

Open cmmartti opened 5 years ago

cmmartti commented 5 years ago

Some data types have a number of different text fields. For example, abilities have effect, short_effect, name, and flavor_text, and some others have description, etc. Which of these are ripped from the games, and which are written by fans? I know that flavor_text is from the games, and so are many of the names, but I'm unsure about the rest.

magical commented 5 years ago

Great question!

In general

For names, you can glean some information from the table name:

A good example of this is location names vs location area names: location names are official, so the translation table is named location_names; location area names are unofficial so the translation table is named location_area_prose. This is just a convention and probably isn't 100% accurate though.

Another thing you can do is look at how many translations a text field has. If it has translations for every language, it was definitely ripped from the game. If there is only English, or only a couple languages, then it is probably fan-generated.

A while back @encukou added some annotations to tables.py to try and help with this. Official text columns are supposed to have an info dict with official=True (like so). Some of them also have format='gametext' which means the text is ripped. I don't think anyone has done a careful audit of these though, so they may be inaccurate. For example, generation names are marked as official but i don't think that's true.

A good way to be sure if a piece of data was ripped is to check the commit log for the csv file. When we add ripped data the commit message usually says so. For example, https://github.com/veekun/pokedex/commit/c222dc807c9896dc260c2163b11a7acf9025c885.

Hope that helps~

magical commented 5 years ago

P.S. If you have questions about a specific column, feel free to ask on IRC.

sdcinglis commented 5 years ago

I'm curious about pokemon_species_flavor_text, it's seemingly official but there are lots of missing translations. Versions 1-16 (Red-SoulSilver) only have english text. Any idea why?

encukou commented 5 years ago

To extract the data, someone who knows how to do the extracting must own a game that has the data.

(That's the general idea. Of course, in reality it's more complicated.)

magical commented 5 years ago

To expand on encukou's answer, Gen V was the first time all the languages were included in a single ROM (except... Korean? i think?), making it very easy to rip the text in every language at once. Before that, if you wanted to get foreign language text, you would have to track down the ROM for that language. Possible, but i guess we haven't done so.