interesting topic - Githubissues

paulcx commented 7 years ago

It's quite interesting if this approach could be used for matching the ICD codes based different language version or do sort of machine medical concepts translation based on the term representation.

mp2893 commented 7 years ago

Thanks for your interest in this work. Although, I don't exactly understand what you meant. Are you talking about projecting words and medical concepts (e.g. ICD9 codes, medication codes, procedure codes) to the same latent space? We are considering similar approaches for our future work, but not just words but also other modalities of medical data.

paulcx commented 7 years ago

Just a few thoughts and they may not be possible in your case. What I meant was to match for example ICD-9 procedure codes to other version, different language or different medical classification codes (e.g. ICHI) by the description. It may require the semantic matching tecnologies I may wonder.

mp2893 commented 7 years ago

I see. Let me re-iterate your goal just to see if I am not mistaken. Given two different coding schemes for the same medical concept, such as ICD-9 procedure codes and CPT procedure codes, you want to see which ICD-9 procedure code corresponds to which CPT procedure codes (or vice versa) In that case, it would be easy if you have two datasets where one dataset uses, for example, ICD-9 diagnosis codes and ICD-9 procedure codes, and another dataset uses ICD-9 diagnosis codes and CPT procedure codes. Using the first dataset, you can project ICD-9 diagnosis codes and the ICD-9 procedure codes in the same latent space. Using the second dataset, you can project ICD-9 diagnosis codes and the CPT procedure codes in another same latent space. Then, you can select one diagnosis code and retrieve k-nearst procedure codes respectively from each latent space and compare the retrieved procedure codes. Of course, to use this approach, the two dataset need to consist of similar patients (if one dataset is of children and another of seniors, the distribution of medical codes won't match), similar patient size, similar ICD-9 diagnosis codes, etc. But this approach does not require you to compare the text descriptions of ICD-9 procedure codes and CPT procedure codes to see which corresponds to which, thus eliminating the need to use NLP techniques. Otherwise, you can compare the text descriptions of ICD-9 procedure codes and the text description of CPT procedure codes and decide which corresponds to which, which is more straightforward. I think, with good medical NLP tools, this will yield better results, because I assume it won't be easy to find two similar datasets as mentioned above.

paulcx commented 7 years ago

Thank you for the inspriation. However, the first approach may require the same/similar granularity of the code schema for matching precisely. I'm wondering if your second proposal will work by using learning representation to encode the medical concepts based on their semantic information. If I'm right, your med2vec could be alternative to the exsiting coding system like ICD-9,10 and it will include not only the classification meaning but also the linguistic information. A previous work I have done is to use NLP to create a mapping table between ICD-9 and ICHI.

mp2893 commented 7 years ago

My second proposal was actually more similar to your previous work (creating a mapping table between ICD-9 and ICHI). But if you have a good way to embed the descriptions of medical codes (such as doc2vec, or any sentence embedding algorithm) then you can project, for example, ICD-9 procedure codes and CPT procedure codes to the same latent space using their descriptions. This would enable you to find out which ICD-9 procedure codes are similar to which CPT procedure codes. But this approach requires a pre-trained sentence embedding (or text embedding) model. Typically, embedding algorithms used in NLP (word2vec, doc2vec or other approaches) require a huge corpus to train on. But the descriptions for the medical codes are very limited. So it is unlikely to work if you train your embedding algorithm only with the code descriptions. You will need pre-train the embedding model using some large medical text, then apply the model to the code descriptions.

2g-XzenG commented 7 years ago

Hello Ed,

I play around with med2vec model on mimic-3 for a few days, and I try @paulcx 's thoughts: Merge codes that are encode under different coding scheme. (this is one application I can think of to evaluate the quality of the vector of medical concepts)

I split the DRUG Code into 2 dataset to mimic two different hospital's data, and keep ICD code the same. After I ran the med2vec model, the performance is very good: I got 80% recall@8.

However, one thing I notice is that: the good result is achieved by the visit-level cost. If I only use the code-level cost, the performance will lower to 5% for recall@8, if I only use visit-level cost, the performance will still be around 80%.

Another thing is that the visit-level cost is much higher than code-level cost. which is reasonable consider the sigmoid function on visit-level, but will that cause the impact that: the model will focus much more on visit-level?

Given the above two things, my question is: During your experiment, do you think visit-level is the key of the success of the medical vector and code-level is not that important? Or do I miss anything?

Thanks!

mp2893 commented 7 years ago

Hi Xianlong,

Generally, I wouldn't recommend running med2vec on MIMIC-III. It is a very small dataset(about 45K patients in total), and there are probably only a couple thousand patients (or even less) that made at least three visits, because it's an ICU dataset. Therefore the visit-level softmax loss probably won't train well. I uploaded process_mimic.py so that people can "try out" med2vec, not to obtain state-of-the-art performance with MIMIC-III. And you even divided MIMIC-III into two parts, so the data size is even more problematic.

Now, to your findings: When you say you got certain recall@8, I have so many questions to ask regarding your experiment setup. And when you say that the visit-level cost is much higher than code-level cost, I'd like to know how much. So if you have time, we can talk on Skype. I'm interested to learn your findings. Please send me an email to mp2893@gmail.com.

BTW, you might be right about the balance between visit-level cost and code-level cost. Empirically, summing the two worked just fine. But if you can think of some clever way to balance the two and run experiments, it would be great to learn new findings.

Thanks, Ed

paulcx commented 7 years ago

@1230pitchanqw Hi，I have almost same questions as @mp2893 asked. It would be nice if you can share your findings in details and we could talk about it.

2g-XzenG commented 7 years ago

hello @paulcx,

Sorry for the late respond. I had a conversation with Ed yesterday and he gives me some valuable advises.

What I did was actually very simple: instead of training two different datasets separately, I trained them together. General idea can be seen below:

Form Ed's proposal: '''In that case, it would be easy if you have two datasets where one dataset uses, for example, ICD-9 diagnosis codes and ICD-9 procedure codes, and another dataset uses ICD-9 diagnosis codes and CPT procedure codes. Using the first dataset, you can project ICD-9 diagnosis codes and the ICD-9 procedure codes in the same latent space. Using the second dataset, you can project ICD-9 diagnosis codes and the CPT procedure codes in another same latent space. Then, you can select one diagnosis code and retrieve k-nearst procedure codes respectively from each latent space and compare the retrieved procedure codes.'''

Now for my findings so far: 1. this method only works when I use In-patient data (more codes per visit); 2. the code-level training is doing very little affect to the training; 3. even though using In-patient data to do the training can lead to a good result on this task, but the quality of the medical vectors is bad: the synonyms have a small cos-sim values (around 0.2, while in Ed's trained vectors they have around 0.9).

Thanks

paulcx commented 7 years ago

@1230pitchanqw Thanks for your insights. I'm wondering if this paper 'Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction' would help somehow.

2g-XzenG commented 7 years ago

@paulcx Thanks! That is very interesting tool to evaluate distances between different vocabularies. Have you ran any experiments with this tool?

mp2893 / med2vec

interesting topic #3