invertsau / inverthotspot

Repository containing workflow to determine short range endemics of Australian Invertebrates
https://invertsau.github.io/inverthotspot/
Apache License 2.0
1 stars 1 forks source link

Updated data cleaning workflow, removing redundant code pieces #13

Closed fontikar closed 2 months ago

fontikar commented 2 months ago

Hey Ashley,

Just quickly reviewed your code. Nice work. I liked your count_words function!

I didn't do much to the workflow, only removing some objects that were generated over and over again which is redundant and finalised the export part of the workflow into parquets and not .csv and incorporating the download name into the new cleaned name.

Though I had some minor questions about the data cleaning workflow.

In the synonyms section, I wasn't able to identify any synonyms with the ALA names in the AFD. Sure the download may have changed and that may be why, but just wanted to flag!

fontikar commented 2 months ago

@JoshNitschke just tagging you so you can see what a quick code review looks like! :)

fontikar commented 2 months ago

@Ashbyi have a look at my edits, then click merge pull request. happy to answer any questions!

I think it still needs to editing and cleaning up, but my focus is to quickly get a refresher of the workflow before using the cleaned data. I think documentation is a bit light, its not entirely clear to reader how many duplicates are removed for example!

Well done

fontikar commented 2 months ago

I am merging these now, so I can keep the ball rolling. Would still be good for both @Ashbyi and @JoshNitschke to review this at a later date