vijaybarve / taxotools

Tools to Handle Taxonomic Lists
5 stars 1 forks source link

merge_lists removes rows #117

Open Jegelewicz opened 3 years ago

Jegelewicz commented 3 years ago

When merging using merge_lists some rows are removed and I don't know why. Also, because ids are changed by the function, there isn't an easy way to figure out what was lost. For example:

In the original files to be merged I have

id accid source order family genus species subspecies author taxonlevel canonical
72 0 Lewis Siphonaptera Ceratophyllidae Amalaraeus andersoni andersoni (Rothschild, 1908) subspecies Amalaraeus andersoni andersoni
id accid source order family genus species subspecies author taxonlevel canonical
390 0 CoL Siphonaptera Ceratophyllidae Amalaraeus andersoni andersoni NA subspecies Amalaraeus andersoni andersoni

but in the merged file, they appear as

id accid source order family genus species subspecies author taxonlevel canonical merge_tag
1994 0 Lewis Siphonaptera Ceratophyllidae Amalaraeus andersoni andersoni (Rothschild, 1908) subspecies Amalaraeus andersoni andersoni orig
3355 1994 CoL Siphonaptera Ceratophyllidae Amalaraeus andersoni andersoni NA subspecies Amalaraeus andersoni andersoni add

This leaves me with no reference back to the original files. Suggest that "new" ids for merge be added rather than changing originals.

Jegelewicz commented 3 years ago

Also, find attached the two original files and the merged file which has 34 fewer rows than the sum of the original two files. taxo.zip

vijaybarve commented 3 years ago

Work on preserving original ids in compact_ids function and transfer them through merge_lists.