sunlabuiuc / PyHealth

A Deep Learning Python Toolkit for Healthcare Applications.
https://pyhealth.readthedocs.io
MIT License
956 stars 207 forks source link

Does the inner map cotains all the icd code id? #254

Closed Data-Designer closed 9 months ago

Data-Designer commented 9 months ago

I use the following code to get all the icd tokens.


 base_dataset2 = MIMIC4Dataset(
        root="/home/czhaobo/KnowHealth/data/physionet.org/files/mimiciv/2.0/hosp",  # 2.2 不大行
        tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
        code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
        dev=False,
        refresh_cache=False, # 第一次用True
    )
    sample_dataset2 = base_dataset2.set_task(drug_recommendation_mimic4_fn)
    tokenizer2 = Tokenizer(
        tokens=sample_dataset2.get_all_tokens(key='conditions'),
        special_tokens=["<pad>", "<unk>"],
    )
    tokens2 = list(tokenizer2.vocabulary.idx2token.values())
    print(tokens2)
    diag_sys1, proc_sys1, med_sys1 = get_stand_system('MIMIC-III')
    diag_sys2, proc_sys2, med_sys2 = get_stand_system('MIMIC-IV')

but when i try to find their name via Innermap.lookup, i always get a key error. For example, H4011X0 is a id in tokens2,
```python
if __name__ == "__main__":
    icd9cm = InnerMap.load("ICD9CM")
    icd10cm = InnerMap.load("ICD10CM")
    print(icd9cm.lookup('H4011X0'))
    print(icd10cm.lookup('H4011X0'))
Data-Designer commented 9 months ago

for mimic-III datasets, a smiliar problem.


from pyhealth.medcode import InnerMap
if __name__ == "__main__":
    icd9cm = InnerMap.load("ICD9PROC",refresh_cache=False)
    icd10cm = InnerMap.load("ICD10PROC",refresh_cache=False)
    print(icd9cm.lookup('3601'))
zhandand commented 9 months ago

It seems that the InnerMap class get the required information from the files from https://storage.googleapis.com/pyhealth/resource/. We must figure out how they get the files.

Data-Designer commented 9 months ago

It seems that the InnerMap class get the required information from the files from https://storage.googleapis.com/pyhealth/resource/. We must figure out how they get the files.

Yes, i have examined the source code and find that some icd code has been deleted or replaced in the latest published version.

ycq091044 commented 9 months ago

Could you help look into the problem @pat-jj?

pat-jj commented 9 months ago

Hi, thanks for the questions. Please check our development code for code mapping. You can find the source files we used there.