ropensci / coder

Classification of Cases into Deterministic Categories
https://docs.ropensci.org/coder/
22 stars 4 forks source link

Consider blog post #138

Closed eribul closed 3 years ago

eribul commented 3 years ago

See: https://blogguide.ropensci.org/

eribul commented 3 years ago

Rel re Med-BERTf: https://arxiv.org/abs/2005.12833 - nämna att man kanske inte nöjer sig med färdiga koder i framtiden men att det än så länge är vad som gäller i praktiken.

eribul commented 3 years ago

From editor:

Congratulations on coder passing peer review! Would love to have a post. We have openings for publication in late January. Please suggest a date for submitting a draft via pull request and I'll provide a publication date. https://blogguide.ropensci.org/ gives content and technical guidelines.

in your post, you should talk about where these codes/categorizations come from, what they mean and how they are used for a general open science that may not be familiar with medical/clinical data, and how his package solves a problem researchers have with using them.

eribul commented 3 years ago

Deadline för min del 2021-01-25.

eribul commented 3 years ago

Once upon a time, in countries not too far from ours, there were doctors and nurses making up funny names for any diseases they encountered. What Dr. A called X would be recognized only as Y by his dear colleague Dr. B. X, however, was also a name used by dr. C, but then for a completely different condition. It was a mess! Those times are long gone due to the fascinating story of medical coding. It started in the first half of the 20th century and has been well described in a free on-line book by XXX and YYY (URL). Today, when visiting the hospital, our (somatic) diseases and medical conditions are most likely recorded by some adaptation of the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10). It is a global standard administrated by the World Health Organization. Although a new version, ICD-11, is available online, the old version will still be used for research for a long time. This is true also for its predecessors ICD-7/8/9 (and possibly, although less likely, also for earlier versions). Psychiatric conditions might as well be recorded by the Diagnostic and Statistical Manual of Mental Disorders (DSM). Prescribed medical substances are recorded by their Anatomical Therapeutic Chemical (ATC) classification system. All-in-all, the mess has significantly decreased! But the world of medical coding is still rather messy! Not only are there differ versions of ICD. There are even yearly revisions of it, as well as a clinical version and several national implementations varying in different ways. All this might be motivated from a clinical and/or administrative view-point, but it also makes things a little bit harder for researchers trying to pool data from different data sources, countries and periods. There are often far too many codes to be used individually. A lot of people have realized this. Charlson, Elixhauser, Sloan and others have therefore proposed to combine separate codes into broader categories in order to measure patient comorbidity more holistically. Those are known as the Charlson, Elixhauser and RxRisk V comorbidity indices. To use those combined indices make things much easier. There are still several versions of each index, however, both considering the medical conditions, but also regarding the underlying diagnostics codes, and how to weight them against each other. To use those indices in practice is also cumbersome since many data sources are large, larger than what would fit into the random access memory of a standard computer, which is often the requirement to carry out this work in R. There are some excellent R-packages, which have partially solved this problems. Both {comorbidity} by Alessandro Gasparini and {icd} by Jack O. Wasey and Michael Lang are very well documented, implemented, and supported (although {icd} is, unfortunately, no longer available through CRAN). Each of those packages have fast implementations for the most common versions of both the Charlson and Elixhauser comorbidity indices. To keep up with all different and newly proposed versions seems like a daunting, if not impossible task, however. The new {coder} package offers an alternative, more flexible, implementation. It is not hard-wired to Charlson and Elixhauser, although capabilities for those indices are also provided. The {icd} package uses {data.table} internally for speed. It also relies on design principles from {tidyverse} to support a natural work flow. Codes (ICD, ATC or whatever) are represented in a compact way by their corresponding regular expressions. Peptic ulcer disease for example is recognized as an important comorbidity in the Elixhauser classification. It might be coded by ICD-10 as “K25.x–K28.x” as expressed by Quan et al in 2005. This format is well understood by humans; “-“ indicates an interval and “x” acts like a wildcard for any additional alphanumerical characters. This might be translated into a code list (ignoring the dots): K257, K259, K267, K269, K277, K279, K287, K289. But why those codes only? Why not K250 or K289A? Well, not all codes are used in practice. The list corresponds to all codes from the clinical implementation of ICD-10 (ICD-10-CM) used in 2020 in the USA. The list might look different with data from the Swedish version of ICD-10 (ICD-10-SE) recorded in 2021. Regular expression will relax the need of a complete and version-specific code list. It also enhance computational speed (although both {comorbidity} and {icd} are also very fast due to other design principles). A regular expression for peptic ulcer disease would be ^K2[5-8][79] where “^” marks the start of the character string and where alternative digits are written within brackets. By default, there are 413 of those regular expressions included in the package. They cover all conditions recognized by Charlson, Elixhauser, RxRisk V, the comorbidity-polypharmacy score (CPS), as well as for some diagnose specific adverse event after hip or knee replacement. Charlson, Elixhauser and Rx Risk V have 6, 5 and 3 alternative regular expression for each condition. Peptic ulcer disease for example can also be recognized from 53([1-4][79]0)|V1271, 53([1-4]([4-69]1|7)) or 53[1-4][79] based on three different versions of ICD-9-CM. Additional versions (for example to correspond with ICD-11) are easily implemented by the user using the prepared structure within so called classcodes objects. The classcodes object is an important feature of the package. Those objects could also be illustrated graphically or be transformed into comprehensive code books with all codes listed and explained. As for the question above, does the ICD-10 code list for peptic ulcer disease look the same for ICD-10-CM and ICD-10-SE (both with their annual revisions from 2020)? We can easily find out by comparing summary(Elixhauser, “icd10cm”) to summary(Elixhauser, “icd10se”). In addition to the classcodes objects, we also need some patient data. This usually come from two sources: 1) a study cohort of interest; and 2) an additional (potentially large) register with administrative data where the same patients are followed over longer periods (n hospital register, a national patient register or a medical prescription register). Comorbidity is (by definition) only relevant if recorded before another index event of interest. This period is usually limited, perhaps to one year prior to this event. Dates of interest must therefore be compared to dates of code relevancy. This functionality is also implemented in the {coder} package.

eribul commented 3 years ago

Har påbörjat Word-dokument som sparats via RCC:s Office. Behöver sedan göras på rätt sätt för pull request etc.

eribul commented 3 years ago

Default checklist

eribul commented 3 years ago

Author-template:

name: Erik Bülow
link: https://www.gu.se/om-universitetet/hitta-person/6214de75-71f0-47bb-8b2a-3f20302ad6cb
github: eribul
orcid: 0000-0002-9973-456X
eribul commented 3 years ago

introduction about the package, that it is availale and peer-reviewed etc.

Describe why I needed it.

eribul commented 3 years ago

PR: https://github.com/ropensci/roweb3/pull/123

eribul commented 3 years ago

De bekräftar att ärendet mottagits och kommer att återkomma. Jag stänger detta ärende då resterande hantering sker via aktuell PR såvitt jag förstår det.