Open cmmartti opened 5 years ago
Great question!
In general
For names, you can glean some information from the table name:
A good example of this is location names vs location area names: location names are official, so the translation table is named location_names
; location area names are unofficial so the translation table is named location_area_prose
. This is just a convention and probably isn't 100% accurate though.
Another thing you can do is look at how many translations a text field has. If it has translations for every language, it was definitely ripped from the game. If there is only English, or only a couple languages, then it is probably fan-generated.
A while back @encukou added some annotations to tables.py
to try and help with this. Official text columns are supposed to have an info dict with official=True
(like so). Some of them also have format='gametext'
which means the text is ripped. I don't think anyone has done a careful audit of these though, so they may be inaccurate. For example, generation names are marked as official but i don't think that's true.
A good way to be sure if a piece of data was ripped is to check the commit log for the csv file. When we add ripped data the commit message usually says so. For example, https://github.com/veekun/pokedex/commit/c222dc807c9896dc260c2163b11a7acf9025c885.
Hope that helps~
P.S. If you have questions about a specific column, feel free to ask on IRC.
I'm curious about pokemon_species_flavor_text, it's seemingly official but there are lots of missing translations. Versions 1-16 (Red-SoulSilver) only have english text. Any idea why?
To extract the data, someone who knows how to do the extracting must own a game that has the data.
(That's the general idea. Of course, in reality it's more complicated.)
To expand on encukou's answer, Gen V was the first time all the languages were included in a single ROM (except... Korean? i think?), making it very easy to rip the text in every language at once. Before that, if you wanted to get foreign language text, you would have to track down the ROM for that language. Possible, but i guess we haven't done so.
Some data types have a number of different text fields. For example, abilities have effect, short_effect, name, and flavor_text, and some others have description, etc. Which of these are ripped from the games, and which are written by fans? I know that flavor_text is from the games, and so are many of the names, but I'm unsure about the rest.