Closed wcornwell closed 2 years ago
let me think about this now. My immediate thought was yes, that would be the best, but I need to figure out the most efficient way to do it
cool. we just need it to be in code, so that we can scale up to Australia without too much trouble
btw, for something this complicated, it's typical to write pseudo-code first
ok I have no clue if what I'm proposing here is actually possible in R, but this is what I think would be the most efficient method
Do this all make sense and is what I'm describing actually doable?
looks good and very possible in R!
Next step is to do a few (~5) by hand--so we can test if the (yet to be written) code is working. Then we turn each step into a line of code.
Bummer about the next bioblitz, should we re-institute the friday meeting?
yeah I think reinstitute for this week at least
I've now pushed an updated version of the 'synonym cleaning' script that has code for steps 1-5 above, plus the random sample code. Just need code for steps 6 and 7
I think 6 and 7 might be a situation for https://www.statology.org/dplyr-case_when/
looking at a lot of tutorials, and almost all seem to be for numerical values; can this be used for text strings?
yes, just need a few more "vocab words"
i'd use method 2 here: https://www.geeksforgeeks.org/how-to-test-if-a-vector-contains-the-given-element-in-r/
ok I can now get it to simultaneously check all accepted names and synonyms and match them to the iNat names, giving a 'yes' if matched to accepted, a 'yes2' if matched to a synonym, and a 'no' if matched to nothing using this:
what I haven't figured out is how to, instead of putting a 'yes2', to insert the accepted name correlating with that synonym
running this code for the entire Tassie dataset, we get:
1307 names perfectly matched 51 names for which the iNat name is treated as a synonym by APC 34 names for which there was no match with the APC
Looks like there are a fair few recent garden escapes in Tassy of plants that are native to other parts of Aus! Interesting that iNat is picking them up.
Let's discuss next steps tomorrow!
We are just about all systems go for the whole dataset. Check out my latest push (synonym-cleaning v2), the code in which now produces a dataframe that looks like this:
I also pushed an excel file ('Tassie checks') that contains all the species in the above dataframe that I had to manually check, and my recommendations for action. A few of these I'd like to check with you guys before we upscale to all of Australia
Are we correcting the iNat names to APC? before the match? what's the best way to do this?