veekun / pokedex

more than you ever wanted to know about Pokémon
MIT License
1.44k stars 637 forks source link

Dataloss in tracking evolution forms #243

Open jrubinator opened 5 years ago

jrubinator commented 5 years ago

The current data schema is unable to express that evolving Cubone at night results in Alolan Marowak. This is because evolution is tracked on a species-level granularity (in both pokemon_evolution.csv and pokemon_species.csv).

Thus in pokemon_evolution.csv, we have two entries distinguishable from each other only in that one happens by night:

...
62,105,1,,28,,,,,,,,,,,,,,0,0
...
367,105,1,,28,,,,night,,,,,,,,,,0,0
jrubinator commented 5 years ago

Lycanroc has similar issues (form is not reflected and game restrictions are not reflected either), and there may be others as well:

sqlite> select s.id, s.identifier from pokemon_evolution e left join pokemon_species s on e.evolved_species_id=s.id group by evolved_species_id having count(*) > 1;
id|identifier
105|marowak
350|milotic
462|magnezone
470|leafeon
471|glaceon
476|probopass
678|meowstic
745|lycanroc

(offhand I know that leafeon/glaceon are location/game-based differences, and are legit. I haven't looked at the others)

magical commented 5 years ago

we're also unable to express that, e.g. Plant Cloak Burmy evolves into Plant Cloak Wormadam. iirc there are a couple other cases as well.

the way we deal with these special cases right now is just by explaining them in the evolution prose, which works well enough for veekun. do you have another use case in mind that requires more precise information?

magical commented 5 years ago

id|identifier 105|marowak 350|milotic 462|magnezone 470|leafeon 471|glaceon 476|probopass 678|meowstic 745|lycanroc

Milotic legitimately has two different (equivalent) evolution methods.

Magnezone, Leafeon, Glaceon, and Probopass are all location-based, and so have row in the evolution table for each location they can evolve in in each game.

Meowstic's evolution is based on a gender difference that we can't encode.

Marowak and Lycanroc you already mentioned.

jrubinator commented 5 years ago

do you have another use case in mind that requires more precise information?

Yes, but it's not one I can argue strongly for :) I have a script that tries to gather randomized facts about the pokemon in the game I'm playing, and spit them out for me. It was tripped up recently by Marowak.

It seems like it wouldn't be too difficult to add optional game_id and pokemon_form_id columns to the pokemon_evolution table. Not sure if it's the correct solution/worth it - doing so could invite lots of deduplication of data in future games.

If fixing this isn't something we'd consider right now, I have no objections to closing this as "Won't Do". I figured I'd check though, since I didn't see any reported issues for it. Thanks for the quick response!

PS It looks like there is some funky data for Milotic (two different happiness entries):


id|evolved_species_id|evolution_trigger_id|trigger_item_id|minimum_level|gender_id|location_id|held_item_id|time_of_day|known_move_id|known_move_type_id|minimum_happiness|minimum_beauty|minimum_affection|relative_physical_stats|party_species_id|party_type_id|trade_species_id|needs_overworld_rain|turn_upside_down
179|350|1||||||||||171||||||0|0
321|350|2|||||580|||||||||||0|0
371|350|1||||||||||170||||||0|0```
herbertmilhomme commented 4 years ago

image

I agree with everything discussed prior, as they were the only duplicates i could locate in the CSV files. However, some of the duplicates i found were bad entries... like the eevee dupes and magnezone dupes are all location based, but they have entries in the table where all evolve values are null... so essentially it shows up as a guarantee evolve instead of "only when criteria is met". Marowak and Meowstic are similar cases... Meow is gender specific, but still has an entry where gender equals null... so it's a also a situation where it will evolve if gender is overlooked.

Lycanroc has 3 evolves, only 2 are mentioned.. time of day criteria in all of database only account for "day" and "night", there should be an in-between entry, maybe something called "mid", "midday", or "dusk" (since it's part of day/night cycle as well as catch/spawn rates for some pokemons, it would be nice to incorporate into evolution chart).

@magical i suggest removing multiple entries that were doubled with null values. These are easily fixable solutions that only require minor tweaking/fine-tuning. I can modify on my end, but i figured it would be helpful to share, as others have pointed out, can lead to some disappointing results.

image

I've highlighted bad entries i believe would be best if removed and will not only improve quality of database but not go against veekuns pokedex policy/rules.

herbertmilhomme commented 4 years ago

Meowstic's evolution is based on a gender difference that we can't encode.

@magical i believe if you use pokemon_forms, instead of pokemon_species as primary key (for both evolution and moveset tables), you can achieve more results closer to original gamefreak pokemon. As the distinctions are better expressed when following individual pokemon strands, rather than grouping them into bulk and applying a blanket criteria over them.

magical commented 4 years ago

@herbertmilhomme

Magnezone, Probopass, Glaceon, and Leafeon need to have appropriate locations set for Gen VII; that's #248.

We could add a dusk entry to time_of_day. I thought Dusk Lycanroc was event-only though?

(Also, could you provide more details about how dusk affects "catch/spawn rates for some pokemons"? To my knowledge, the encounter tables in S/M only have entries for night and day. If you have more info i'd be curious to hear it.)