ropensci / rppo

R package for accessing PPO data store
https://docs.ropensci.org/rppo
BSD 3-Clause "New" or "Revised" License
3 stars 5 forks source link

Issue 10 #11

Closed salix-d closed 2 years ago

salix-d commented 2 years ago

Fixed the problem with the two main functions (ppo_data() and ppo_terms()) not working because of the change of URL.

While I hadn't guess the right url for ppo_terms I found a way to get all the available filters from the ppo page. Then when I guessed the right url for it I found out some terms weren't giving any results when queried so I added the function ppo_get_terms with similar arguments and added an explaination in both their documentation.

I also added the ppo_filters() data set so people can know the number of observation in PPO for the available genera (which I find most useful), as well as the main source, sub-sources, status, and mapped traits (which includes terms). For the mapped traits I merged it with the output of ppo_terms(TRUE, TRUE) to get the defnitions as well. I also added a column for status (present/absent) and 3 'objectPropriety' columns with the terms included in the definition to filtering the terms easier.

For the function ppo_data(), I added some arguments :

Playing with the query url in my browser I found you could have multiple scientificName if seperated by '|'. I tried this with other paramaters but it didn't work. Since the way the query is built looks like an SQL statement I tried putting them in quote and in a parenthesis list (ie : +genus:("genus1", "genus2")) and it seems to work well with the parameters 'genus', 'specificEiphet' (watch out though, sometimes an epiphet with match more than one genus, if you don't want that use, scientificName instead), 'termID', 'subSource', 'status', 'mapped_traits' and 'eventRemarks'.

Since the parameter 'year' could be just one year, maybe, instead of having 'fromYear' and 'toYear', the argument could just be an integer/numeric range ? if length one, add +year:[year] to the query, else +year:>=[min(year)]++AND+++year:>=[max(year)]? This is just an idea, I didn't do that change to not messed up previous workflow.

For my use case, I also needed the actual traits data from this, so I made a new function 'ppo_traits()' which takes each eventID in the dataframe returned by 'ppo_data()' and use the API to get the traits value for that event. The traits weren't sorted what so ever, so I made an attempt at categorizing them. I might have missed classify some, please have a look. User can choose if they want the data sorted or not, if they want the category 'traits' to be melted into a dataframe for better readability or if they want the whole thing as a data.frame (this merge meta and taxonomy data to the melted traits data.frame)

There's also 2 other functions ('ppo_traits_sort()' and `ppo_traits_flatten()') to sort and flatten the output of 'ppo_traits' in case the user set those argument to FALSE and later change their mind. It avoids having to go through the API again.

All new functions have been documented. I didn't add new tests though, but I did make sure the current ones are working and the examples in the documentation of the new functions are also working.

jdeck88 commented 2 years ago

Thanks for the PR... this looks great. i ran the checks and all passed. It looks like everything is working as it should so i'll merge into master.