msberends / AMR

Functions to simplify and standardise antimicrobial resistance (AMR) data analysis and to work with microbial and antimicrobial properties by using evidence-based methods, as described in https://doi.org/10.18637/jss.v104.i03.
https://msberends.github.io/AMR/
Other
83 stars 12 forks source link

ab_ddd() returns NA for several ATCs #46

Closed remcv closed 3 years ago

remcv commented 3 years ago

Thank you for this great package! I use it quite heavily in my antimicrobial stewardship projects.

Issue details

ab_ddd() & atc_online_property() both return NA for:

Session info

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] AMR_1.7.1        lubridate_1.7.10 ggplot2_3.3.3    readr_1.4.0
[5] dplyr_1.0.6

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6       rstudioapi_0.13  magrittr_2.0.1   hms_1.1.0       
 [5] tidyselect_1.1.1 munsell_0.5.0    colorspace_2.0-1 R6_2.5.0
 [9] rlang_0.4.11     fansi_0.5.0      tools_4.1.0      grid_4.1.0
[13] gtable_0.3.0     utf8_1.2.1       cli_3.0.1        DBI_1.1.1
[17] withr_2.4.2      ellipsis_0.3.2   assertthat_0.2.1 tibble_3.1.2    
[21] lifecycle_1.0.0  crayon_1.4.1     purrr_0.3.4      ps_1.6.0
[25] vctrs_0.3.8      glue_1.4.2       compiler_4.1.0   pillar_1.6.1
[29] generics_0.1.0   scales_1.1.1     jsonlite_1.7.2   pkgconfig_2.0.3
msberends commented 3 years ago

Many thanks for this input!

Didn’t know of that list, that could be a great addition. Though it won’t be easy to incorporate all the differences and duplicates in it, e.g. J01CR50 is 8 times in that list…

I’ll update the dataset where I can and also add a test that units should never be missing.

msberends commented 3 years ago

While working on this issue, one thing needs clarification:

ab_ddd() returns the DDD, but not the unit

This is intended. From the documentation:

# defined daily doses (DDD)
ab_ddd("AMX", "oral")               #  1
ab_ddd("AMX", "oral", units = TRUE) # "g"
ab_ddd("AMX", "iv")                 #  1
ab_ddd("AMX", "iv", units = TRUE)   # "g"

We chose to return the value (numeric) so it can be used for other data processing easier. With units = TRUE, you can retrieve the units.

Do you think this needs more convenience? If we give it an additional class in R, we could have the units printed, while the actual returned values are numeric. Didn’t seem a great idea in 2018 when we came up with these ab_*() functions, but we’re always open for debate.

msberends commented 3 years ago

Thank you for this great package! I use it quite heavily in my antimicrobial stewardship projects.

By the way, that is AWESOME!!! 🤩

remcv commented 3 years ago

Sorry for not being clear in my issue description. I've read the documentation (which btw is really well made) regarding the return type of ab_ddd() function. The problem I face is the following:

r$> ab_ddd("J01DI54", administration = "iv", units = TRUE)
[1] NA

The return is NA, but it should have been g https://www.whocc.no/atc_ddd_index/?code=J01DI54. The same inputs used with atc_online_property() return the corrrect result.

r$> atc_online_property("J01DI54", property = "U", administration = "P")
[1] "g"

Do you think this needs more convenience? If we give it an additional class in R, we could have the units printed, while the actual returned values are numeric. Didn’t seem a great idea in 2018 when we came up with these ab_*() functions, but we’re always open for debate.

I personally do not mind this. I've written my R scripts using ab_ddd() with the units argument TRUE and FALSE, depending on my needs. In other programming languages (Java, C++) it it not possible to have different return types for a function/method, but I do not know if in R this is a bad practice (although it is confusing for people that code in a different language). Another approach is to return an R list with both the number and the unit, if you want to avoid creating a new class or leave it as it is.

remcv commented 3 years ago

The hospital I am doing the analysis for also uses some antimicrobials that are not in the ATC J category, but in the P category, like oral metronidazole https://www.whocc.no/atc_ddd_index/?code=P01AB01. When running ab_ddd() on it, the return value is NA:

r$> ab_ddd("P01AB01", administration = "oral")
[1] NA
Warning message:
These ATC codes are not (yet) in the antibiotics data set: "P01AB01". 

Please let me know if I should open a separate issue for this.

msberends commented 3 years ago

At this moment, ATC codes are unique identifiers for antibiotics in our data set. This should not be the case, so this requires a big change in our antibiotics data set. Will be a great addition though.

Please let me know if I should open a separate issue for this.

Nah, not needed.

msberends commented 3 years ago

In other programming languages (Java, C++) it it not possible to have different return types for a function/method, but I do not know if in R this is a bad practice (although it is confusing for people that code in a different language).

Totally with you there, it shouldn't be confusing. So implemented this for the next release:

ab_ddd("AMX", units = TRUE)
#> [1] "g"
#> Warning message:
#> Using `ab_ddd(... , units = TRUE)` is deprecated, use `ab_ddd_units()` instead. This warning will be shown once per session.

I'm also reviewing other AMR functions regarding this, although I'm pretty certain this is one of the very few that can have different data types as output.

remcv commented 3 years ago

Sounds good :satisfied:. This is a cleaner approach.

I'm pretty certain this is one of the very few that can have different data types as output.

The atc_online_property() function can also output different data types.

msberends commented 3 years ago

atc_online_property() retrieves results from an online WHOCC table. I agree it's not the cleanest option that it can return different output values, but I think it would break a lot of code of many people if we change that behaviour. ab_ddd() is already much more specific, so it's a less impact to deprecate the use of the units argument there. I have added the atc_online_ddd_units() function that specifically retrieves units based on the administration (oral / parenteral / ...).

Also found out that our reproduction script to create the antibiotics data set was lacking the retrieval of units. That explains this:

ab_ddd("J01DI54", administration = "iv", units = TRUE)
#> [1] NA

Updating the data set now.

msberends commented 3 years ago

The development version nows contains the fix to this issue. You can try the latest version yourself using:

install.packages("remotes") # if you haven't already
remotes::install_github("msberends/AMR")
remcv commented 3 years ago

Hi Matthijs, Thank you for your involvement and for the discussion! Will update my repos to the new version 👍