jackwasey / icd

Fast ICD-10 and ICD-9 comorbidities, decoding and validation in R. NB use main instead of master for default branch.
https://jackwasey.github.io/icd/
GNU General Public License v3.0
241 stars 60 forks source link

Provide JSON data formats #163

Open fdabek1 opened 6 years ago

fdabek1 commented 6 years ago

As somebody wanting to use the different hierarchies found within the data folder in Python, it would be convenient if it would be possible to provide the RData files as JSON files. Currently, I need to install R and write a script to be able to achieve this. (For anybody needing such a script, I am attaching mine below.)

Also on a tangential topic to this, would it be possible to embed the descriptions into the mappings? For instance, for CCS each level has a different description name which would be convenient to be within the mappings already.

library(jsonlite)

folder = '~/Downloads/icd-master/data'

dir.create(file.path(folder, 'json'))

files <- list.files(path='~/Downloads/icd-master/data/', pattern='*.RData')
lapply(files, function(x) {
  data <- get(load(paste('~/Downloads/icd-master/data/', x, sep='')))
  base_name <- gsub('RData', '', x)
  json_name <- paste(paste('~/Downloads/icd-master/data/json/', base_name, sep=''), 'json', sep='')
  write(toJSON(data), json_name)
})
jackwasey commented 5 years ago

Thanks for the idea. Obviously, I'd rather people used R, but if you'd be able to send a pull request, it sounds like a useful feature, since easily machine readable ICD data is hard to come by.

On the second topic, this sounds interesting. @vitallish contributed this code: any thoughts?

vitallish commented 5 years ago

@jackwasey I think this sounds like a good idea - I can add the labels. Would it make sense to add the labels by naming the items within the comorbidity_map items or adding by adding it as an attribute. Is there any other functionality within the icd package that would better solve the problem?

jackwasey commented 5 years ago

I do use attributes for data types, like icd_short_diag but it adds a lot of complexity, and printing out values with attributes is ugly and confusing.

For general comorbidity maps, I use, e.g. icd_names_ahrq which has the name of each category in the AHRQ mappings icd10_map_ahrq and icd9_map_ahrq. If I understand correctly, @fdabek1 would like each CCS level, e.g., 1, 2.. (not the ICD codes themselves) to have a description. This would best be accomplished with new character vector structures named icd9_names_single_ccs , icd9_names_multi_ccs, icd10_names_single_ccs and icd10_names_multi_ccs .

An alternative would be a named list with the descriptions as the names and the values as the CCS numbers, which would allow lookup of description for the CCS number quite easily. This would be analgous to icd10_chapters etc..

I'm not exactly sure of the use-case, but I don't think attributes aren't the best answer.