visit representation evaluate result on mimic3

2g-XzenG commented 7 years ago

Hello choi, thanks for sharing the code on github, it is a great topic.

After reading several your papers, I have a few questions:

Do you have the visit representation evaluate result on mimic3? Compare with your GRAM model, which one have a better performance? (I ask this because on CHOA, the recall@30 is around 76%, while in GRAM paper on mimic3, the accuracy@20 is relatively low, like 30% on average)
When you learn the vector representation of medical concepts, you want these vector eventually under the same common space. But is it make sense to treat them under the same common space in the first place? for example, you make one dictionary for procedure codes, diagnosis codes and medication codes, and then make one one-hot vector for all these codes.

Thanks

mp2893 commented 7 years ago

Hi Xianlong,

Thanks for your interest in our work.

To answer your question:

To be fair, CHOA and MIMIC-III are very different datasets, former being an outpatient record of 550K patients and the latter being the ICU records of only 7K patients. Also there are more codes per visit in MIMIC-III than CHOA, so the performance cannot straightforwardly be compared. I haven't tested Med2Vec on MIMIC-III. But MIMIC-III is a public dataset, so you could do evaluation yourself. It would be great if you could share the results as well.
That's a valid question. I think it depends on what you want to achieve with concept embedding. I was interested in finding out the underlying relationship between different types of codes. For example, if you embed diagnosis codes and medication codes to the same latent space, you can easily find out which drugs are closely related to which diagnoses. Moreover, in Med2Vec, if you embed diagnosis/medication/procedure codes to the same latent space, you can study that latent space and find out how each dimension is related to various diagnosis/medication/procedure codes (see Table 5 in the paper).

Thanks, Ed

2g-XzenG commented 7 years ago

Hi Ed, thanks for your quick respond.

I am working on a medical related project (predict "fraud" billing, "define" patient status and etc. ), finding a good representation of medical concept will be a great help for me, and this paper seems achieved state-of-art performance (right? ^_^), so I would like to bother you with some detail questions if you don't mind.

For visit-level prediction, you used softmax to predict the neighbor visit, but there are multiple codes for each visit (so it is like a multi-label classification problem), is it better to use sigmoid instead of softmax?
I ran with the mimic3 dataset, the training result seems very good (I evaluated by looking at some ICD codes' neighbors), but the training loss are very high even after 100 epochs (begin with 400 and reach 360 at 100 epochs). I think this is because of the softmax I mentioned above. Is this a problem? It seems in this way, the loss for code-level part doesn't matter very much (loss for this part will be small).

Thanks Xianlong

mp2893 commented 7 years ago

To be fair, Med2Vec is a co-occurrence based algorithm, so it will show good performance in applications where co-occurrence information between codes plays an important role. But Med2Vec probably won't help you find novel cure for cancer. For fraud detection, I think it will be helpful since fraud detection can be seen as anomaly detection.

As for your questions:

Your question is valid. Actually I tried both softmax and sigmoid in other papers when doing the visit-level prediction. But softmax almost always out-performed sigmoid. We think this is because softmax is a strong regularizer due to the normalizing denominator. Also, there aren't many codes per visit (e.g. typical less than 10 in most datasets) so using softmax instead of sigmoid doesn't have too drastic a impact.
If you use the input sequence as the label sequence then it will take a very long time to train because you are training a softmax with tens of thousands of possible outcomes. In my paper, I grouped the codes with existing groupers (such as CCS diagnosis grouper) to reduce the output space. I suggest you do the same, as it significantly increases training speed, and has minimal impact to the overall performance (although it depends on what application you have in mind)

Thanks, Ed

KirkHadley commented 7 years ago

Gentlemen, It's always great stumbling upon strangers on the internet discuss exactly the problem you work on... until you also realize that what you were so certain was a novel idea is already a thing.

Xianlong- Ed is absolutely right insofar as the boon you'll get out a representation strategy that's at least a nudge towards semantic "understanding."

Last thing, I'm currently running this in admittedly much uglier fashion than yall, but I do distribtution configured/implemented. Is that something for which you'd appreciate a pull request or would it really just be another you had to maintain?

mp2893 commented 7 years ago

Hi Kirk,

It's wonderful to meet another person with the same interest. It would great to have a distribution-learning-enabled med2vec for people with large data. I can't guarantee I'll promptly review the code, but it would be nice to have a pull request. (or we can have a separate script that trains in distributed fashion)

mp2893 / med2vec

visit representation evaluate result on mimic3 #4