inbo / iassetR

Interact with the iAsset api using R
https://inbo.github.io/iassetR/
Other
0 stars 1 forks source link

31 Adding metadata retrieval to `get_record()` #37

Closed Turtle6665 closed 5 months ago

Turtle6665 commented 6 months ago

Fixes #31.

List of changes

A possible improvement I currently see

PietrH commented 6 months ago
* Currently, when using `get_record(get_metadata=TRUE)`, it's difficult to tell apart metadata and data columns. Saving metadata in a totally different way could be the way to go, but I don't know how it could be clear and efficient at the same time.

I tend to agree. We have to keep the user of the package in mind. In this case, you'd be the user so with the optional flag, I think we'd be fine leaving it in the data. However, a solution I see is to return an object that has the metadata as a child or attribute. rgbif works this way for some function returns. If we go this route, I'd love to use this as an excuse to give S7 a go: https://rconsortium.github.io/S7/articles/S7.html

Also, it's a better design pattern to use an enum even if there only two choices: https://design.tidyverse.org/boolean-strategies.html, are you open to giving this pattern a try?

PietrH commented 6 months ago

Just realised why we don't have any tests, because of the risk of exposing API keys. The solution to this is vcr as is used in rgbif.

https://books.ropensci.org/http-testing/ for more information.

With this in mind it might be better to just keep all the testing in a separate PR that we can work together on but I'll be responsible for.

Turtle6665 commented 6 months ago

get_metadata is a verb, I don't like using verbs for function arguments but like to reserve them for functions. In this case, I think it's justified, but maybe we should go for: include_metadata or show_metadata instead?

I will then go for include_metadata, to me, it's more aligned with what this argument does (it includes the metadata in the returned data). Except if we use enum (see below).

  • Before submitting for review, please run R CMD CHECK locally (check under the build menu in RStudio or via devtools). The CI will catch it, but it's neater if your PR doesn't immediately fail. I've generated the documentation and now it's passing.
  • There are some stylistic things, are you using a linter? Consider lintr! Have a look at style.tidyverse.org, no biggie. Usually in review I'll just fix style issues for the contributor, but in this case I'll leave the choice up to you. Let me know if you'd like me to have a go or would like to have a look at it yourself.

I had never heard of these, but they are definitely worth the use. Thanks for the tips ! I will use lintr from now on and for this PR.

Also, it's a better design pattern to use an enum even if there only two choices: https://design.tidyverse.org/boolean-strategies.html, are you open to giving this pattern a try?

Yes, I can try it, it should not take too long. The enum to replace the get_metadata could be metadata = c("discard","include"). You may have more favorable options ?

PietrH commented 6 months ago

What do you think about none, all, and id ? What should be the default? None? Id? I believe including an id by default is a great idea

Turtle6665 commented 6 months ago

@PietrH, this is indeed a good idea. For the included id, which of the following id would you consider the best choice to return alongside the data:

PietrH commented 6 months ago

@Turtle6665 I'm having some trouble visualizing the first option. Could you give a (mock) example? Just a few rows of fake data is fine!

Turtle6665 commented 6 months ago

@PietrH, This would be the first option: image The second: image and the third: image and the third bis (to be more consistent with the GUI export but strange considering the id columns would be different from the one when using all): image

PietrH commented 6 months ago

My vote goes for the third option

Turtle6665 commented 6 months ago

@PietrH, The get_metadata is now an enum with the three options. By testing a bit, I found out that the due to this line, the number of observations differ in the case with get_metadata="none" (currently 10463) and the other two (currently 10488). As some observations with different insp_orders have the same data (they are duplicates in IAsset), filtering out (with dplyr::distinct()) only based on the data and not with the metadata means we remove more observations than with the other two options.

Removing this row is not the way to go as some observations are in still in double (same data and same metadata). All those insp_order have at least one double entry in the returndata : "59923" "59942" "59638" "59695" "59752" "59771" "59790" "59809" "59828" "59847" "59866" "59885" "59904"

PietrH commented 6 months ago

How about only removing the columns after you've done the distinct?

Are you sure this is an issue? You are more suitable to answer this than me because I don't actually use the iAsset data. But to me it doesn't seem important to get the same number of rows with all metadata options. If you are interested in finding duplicates, you'd want to set metadata appropriately wouldn't you?

On second thought, why are we even offering metadata none, why would you want to get the data without the id? You could still get rid of it yourself if you wanted to anyway.

I propose we get rid of the option none and solve this issue that way.

PS: When you are done working on this, please request my review again and I'll have a more in depth look

PietrH commented 6 months ago

I haven't found the time to look at this yet, but it's on my radar.

SanderDevisscher commented 5 months ago

@PietrH can this PR be approved ?

PietrH commented 5 months ago

@SanderDevisscher Can you have a look at the changes starting with 32b7dd0368288a63c34cde0181a4f8f5e7c33edc, I trust your judgement. Make sure the CI tests pass.

If nothing else we should setup some unit tests for this package. Without them reviewing is becoming more and more difficult.

PietrH commented 5 months ago

Thanks for the review Sander, ok to merge for me. I'll leave the honors to Turtle6665

SanderDevisscher commented 5 months ago

@Turtle6665 is no longer active at our team so I did the merge