Open wcornwell opened 6 years ago
I'll take a look. I'm a bit confused with the extra stuff here. How are you indicating the canonical names? It shouldn't lose any on that list and if it does something is wrong.
OK, I see what you are showing now. It was easier to look at the raw expanded and merged lists. Yes, the expanded names function will not return expanded synonyms that are already in the canonical list. I can look in more detail at why. In general, this is what I would want but I will take a look at synonymize.py.
I am working from imperfect memory but I believe that behavior may be to avoid endless loops on the sister synonym search. Maybe you can convince me why this is the wrong behavior and how to fix in synonymize.py.
Thanks for having a look.
I can see why it would be done the way you've done it, but from a user perspective with long lists it's hard to know this is happening.
If it's easier, one alternative to changing the script is to just change the readme instructions so that it's:
that would also solve it from a user perspective
I won't have time to look at this more until after spring break. But perhaps what this ends is a rethink of data in/out and how this is used. Perhaps the full lookup table as a single step. The current weird design is an historical artefact of my initial uses.
One strange behavior that I've just found: I wrote a little R script that is a wrapper for your python script:
It seems to work fine, For example:
which is correct. Planchonella australis is the new name for Pouteria australis
except if the original names contain both the synonym and the correct name.
it loses the Pouteria australis -> Planchonella australis information.
Any idea what's going on?