sunlabuiuc / PyHealth

A Deep Learning Python Toolkit for Healthcare Applications.
https://pyhealth.readthedocs.io
MIT License
994 stars 212 forks source link

question about eicu in drug recommendation task #71

Closed azusakou closed 1 year ago

azusakou commented 1 year ago

Thank you so much for your work! when I use eicu data for drug recommendation, I meet an error as: Key drugs has mixed nested list levels across samples.

could you please tell me how to solve this problem?

Thanks in advance

ycq091044 commented 1 year ago

Yes, I'd like to help. Could you share with us what functions you have called?

azusakou commented 1 year ago

Yes, I'd like to help. Could you share with us what functions you have called?

Thanks for your reply.

from pyhealth.datasets import eICUDataset

base_dataset = eICUDataset(
    root="./eicu-crd/2.0",
    tables=["diagnosis", "medication", "physicalExam"],
    dev=True,
    refresh_cache=False,
)
sample_dataset = base_dataset.set_task(task_fn=drug_recommendation_eicu_fn)
sample_dataset.stat()
print(sample_dataset.available_keys)

the error is from: sample_dataset = base_dataset.set_task(task_fn=drug_recommendation_eicu_fn)

ycq091044 commented 1 year ago

Thanks for your patience. I just found the issue. We have a typo in drug_recommendation_eicu_fn in the current package version. Could you please use the following function to replace (not import from pyhealth.tasks, but directly define the function and use it):

def drug_recommendation_eicu_fn(patient):
    samples = []
    for i in range(len(patient)):
        visit = patient[i]
        conditions = visit.get_code_list(table="diagnosis")
        procedures = visit.get_code_list(table="physicalExam")
        drugs = visit.get_code_list(table="medication")
        # exclude: visits without condition, procedure, or drug code
        if len(conditions) * len(procedures) * len(drugs) == 0:
            continue
        # TODO: should also exclude visit with age < 18
        samples.append(
            {
                "visit_id": visit.visit_id,
                "patient_id": patient.patient_id,
                "conditions": conditions,
                "procedures": procedures,
                "drugs": drugs,
                "drugs_all": drugs,
            }
        )
    # exclude: patients with less than 2 visit
    if len(samples) < 2:
        return []
    # add history
    samples[0]["conditions"] = [samples[0]["conditions"]]
    samples[0]["procedures"] = [samples[0]["procedures"]]
    samples[0]["drugs_all"] = [samples[0]["drugs_all"]]

    for i in range(1, len(samples)):
        samples[i]["conditions"] = samples[i - 1]["conditions"] + [
            samples[i]["conditions"]
        ]
        samples[i]["procedures"] = samples[i - 1]["procedures"] + [
            samples[i]["procedures"]
        ]
        samples[i]["drugs_all"] = samples[i - 1]["drugs_all"] + [
            samples[i]["drugs_all"]
        ]

    return samples

So, this new function is almost the same as the one that you have imported. The only difference is that in the second to last line:

samples[i]["drugs"] = samples[i - 1]["drugs_all"] + [
            samples[i]["drugs_all"]
        ]

is changed into

samples[i]["drugs_all"] = samples[i - 1]["drugs_all"] + [
            samples[i]["drugs_all"]
        ]
ycq091044 commented 1 year ago

BTW, the task function is really just an example of how we define healthcare tasks. You can follow the pattern and modify for your own purpose.

azusakou commented 1 year ago

BTW, the task function is really just an example of how we define healthcare tasks. You can follow the pattern and modify for your own purpose.

It works! Many thanks!!!