ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

Pre-submission query: R Interface to the SNOMED CT terminology service #548

Closed peterdutey closed 1 year ago

peterdutey commented 1 year ago

Submitting Author Name: Peter Dutey-Magni Submitting Author Github Handle: !--author1-->@peterdutey<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) !--author-others-->@AnikaC-git<!--end-author-others-- Repository: https://github.com/ramses-antibiotics/snomedizer/ Submission type: Pre-submission Language: en


Package: snomedizer
Type: Package
Title: R Interface to the SNOMED CT Terminology Server REST API
Version: 0.3.0
Date: 2022-07-08
Authors@R: c(
    person(given = "Peter",
           family = "Dutey-Magni",
           role = c("aut", "cre", "res"),
           email = "p.dutey-magni@ucl.ac.uk",
           comment = c(ORCID = "0000-0002-8942-9836")),
    person(given = "Anika",
           family = "Cawthorn",
           role = c("rev", "res"),
           email = "a.cawthorn@ucl.ac.uk",  
           comment = c(ORCID = "0000-0002-2438-7495")),
    person("University College London", role = c("cph")))
Description: Interrogate the SNOMED CT clinical ontology using the 
   SNOMED International Terminology Server REST API <https://github.com/IHTSDO/snowstorm>.
URL: https://github.com/ramses-antibiotics/snomedizer
BugReports: https://github.com/ramses-antibiotics/snomedizer/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends:
    R (>= 3.0.0)
Imports: 
    jsonlite,
    purrr,
    dplyr,
    httr,
    methods,
    progress,
    Rdpack (>= 0.7)
Suggests: 
    testthat,
    tidyr,
    knitr,
    magrittr,
    rmarkdown,
    covr,
    pkgdown (>= 2.0.0),
    R.utils,
    oysteR
RoxygenNote: 7.1.1
VignetteBuilder: knitr
RdMacros: Rdpack
Config/testthat/edition: 2

Scope

This package is designed to democratise access to the global standard healthcare terminology resource, SNOMED CT. It provides an R interface to the Snowstorm terminology server, an Java/ElasticSearch open-source application that is maintained and supported by SNOMED International on the basis of well-established technical standards.

Not applicable.

This package is aimed at healthcare analysts and health service researchers who are primarily using R and dplyr for their work. The package will have important uses in biomedical research to allow users easy access to healthcare terminology and ontological reasoning. It does not assume prior knowledge of ontological reasoning or full-text search engines.

Draft manuscript introducing the package

No.

Yes. The documentation informs users that sensitive personal information should not be processed with this package unless behind a firewall.

Unit testing and vignette building relies on a public remote API (https://snowstorm.ihtsdotools.org/snowstorm/snomed-ct/swagger-ui.html or https://browser.ihtsdotools.org/snowstorm/snomed-ct/swagger-ui.html).

SNOMED International (IHSTDO) and the Snowstorm developers are informed of this project.

Many thanks in advance for your feedback! Peter Dutey

emilyriederer commented 1 year ago

Hi @peterdutey ! Thank you for submitting your package to rOpenSci. As we consider fit, I have a few follow-up questions for you.

  1. Could you please elaborate more on how you see this package fitting the "scientific software wrappers" and "text analysis" categories? I currently understand the functionality to focus on retrieving data from the SNOWMED API and (optionally) structuring it into a dataframe which I would judge to be aligned with the "data retrieval" category. Further descriptions of the categories and some linked examples may be found here.

  2. We have recently been discussing internally when and how API wrapper packages add the most incremental to the research community and just released a new blog post on this issue. I see that your manuscript also addresses this, but I'd appreciate any additional thoughts you can share on how the wrapped API eases the user experience. Do you see the benefits more on the technical side (e.g. formulating the request, pagination, etc.) or encoding domain-specific context (e.g. making endpoints more discoverable and documented)?

Thanks!

peterdutey commented 1 year ago

Hi @emilyriederer,

Thank you for the prompt response and please find answers below.

  1. If we must select a single category, then it would arguably be data retrieval. The other two may however also apply. Text analysis: the package can be used to do some very basic named entity recognition, particularly as and when the Snowstorm team release a new feature we requested, which would allow the user to run Elastic multi term fuzzy queries via the REST API. Scientific software wrapper: we fulfil it by offering an API to Snowstorm that is fit for purpose for clinical researchers as they are mostly familiar with R.

  2. The motivation behind this package is to remove some very stubborn obstacles to the use of SNOMED CT, which are both human and technical.

To reference your blog, we are not in a situation where a like for like replica of the Snowstorm API in R would have addressed the community problem – it needing simplifying and designing with a specific community in mind, namely those clinical researchers and professionals enabling the research (healthcare analysts). We need this package to create new opportunities to teach/self-learn SNOMED CT, the Expression Constraint Language, and basics of reasoning in ontologies.

There are also future plans for embedding additional knowledge within snomedizer. The first one coming up is incorporating datasets providing look ups to other vocabularies used for medical products in large research databases such as the UK Biobank and the Clinical Practice Research Datalink. This will come with a special vignettes/tutorial on medicines.

I hope the above makes sense but please do not hesitate to seek clarification and ask more questions. Many thanks in advance Peter on behalf of the Snomedizer team

emilyriederer commented 1 year ago

Thanks @peterdutey ! I really appreciate the detailed and thoughtful reply. All of the additional context is very informative and, as someone outside of this domain, really helps me understand the value of this package. I think this is definitely in scope.

I did check in with the team. Since all of the features that you mention for "text analysis" and "scientific software" are coming from the API, we do consider these to be part of the "data extraction" category whereas the others would more relate to text analysis functionality built within the package and/or non-API wrappers (e.g. of a command line tool).

I'll close this issue for now, but please proceed to a full submission at your convenience!