Closed vrognas closed 1 year ago
Hey, thanks for using the package!
You are absolutely right! I’ll look into this issue. It should definitely return Pseudomonas.
The algorithm behind as.mo()
does some pre-matching first, so not all 70,000 microorganisms need to have a matching score calculated for each input value. I'll think of a solution of this problem.
For now, here's a workaround. You can use the reference_df
argument in as.mo()
and any mo_*()
function by passing on a data set with your 'errors':
# this is wrong indeed:
mo_name(c("P. aeruginosa", "P. aeroginosa"))
#> i Function `as.mo()` is uncertain about "P. aeroginosa" (assuming Pasteurella aerogenes). Run `mo_uncertainties()` to review this.
#> [1] "Pseudomonas aeruginosa" "Pasteurella aerogenes"
# with the 'reference_df' argumemt, we can fix this for now - let's lookup the right ID of this Pseudomonas:
as.mo("Pseudomonas aeruginosa")
#> Class <mo>
#> [1] B_PSDMN_AERG
# use this as info for 'reference_df' (which accepts a data frame):
mo_name(c("P. aeruginosa", "P. aeroginosa"),
reference_df = data.frame(old = "P. aeroginosa",
mo = "B_PSDMN_AERG"))
#> [1] "Pseudomonas aeruginosa" "Pseudomonas aeruginosa
# yeej!
# even easier: use as.mo() in reference_df itself, if you have a 100% certain name:
mo_name(c("P. aeruginosa", "P. aeroginosa"),
reference_df = data.frame(old = "P. aeroginosa",
mo = as.mo("Pseudomonas aeruginosa")))
#> [1] "Pseudomonas aeruginosa" "Pseudomonas aeruginosa"
This process can be automated by using an mo source for the package. In the online manual, you can find that reference_df
at default runs get_mo_source()
. Using this method, you only need to define the errors once in a text or Excel file, and the mo functions of the package will pick them up! So I would suggest for now to read about the mo source functions and try that out.
Thank you for the quick response and elegant workaround! 👍🏼
Fixed in #71, which implements a completely new MO interpretation algorithm. You can test it with the following command, but please be aware that it's a beta version:
install.packages("remotes") # if you haven't already
remotes::install_github("msberends/AMR")
If you want to revert to the latest release (1.8.2), you can just do:
install.packages("AMR")
Hi,
I have just discovered this package and started to play with it; looks very promising and would like to thank you for this contribution.
I have a dataset with bacteria coded as strings, and one string is obviously misspelled: "P. aeroginosa" instead of "P. aeruginosa". The spelling misstake is made in 2/7 (~20%) of cases.
When I use as.mo(), "P. aeroginosa" assumes Pasteurella aerogenes, but also a helpful message on uncertainty. For "P. aeruginosa", as.mo() correctly assumes Pseudomonas aeruginosa:
However, it is my understanding that the organism with the highest matching score that should be returned. When I check, both the misspelled and correctly spelled string returns Pseudomonas aeruginosa as highest matching score:
This means that I would expect as.mo() (and mo_fullname()) to return Pseudomonas aeruginosa in both cases. However, they do not – how come?
Thanks.