mp2893 / gram

Graph-based Attention Model
BSD 3-Clause "New" or "Revised" License
239 stars 71 forks source link

comparison between med2vec and gram #3

Closed 2g-XzenG closed 7 years ago

2g-XzenG commented 7 years ago

Hello Ed,

Nice work! I didn't pay much attention to this paper at the beginning since you mentioned in the paper this method works well when the dataset is small. So I though Med2vec will give us a better performance when we have a large dataset.

However, now I look closer to the paper, it seems that GRAM will have a better performance than Med2vec and non-negative skip-gram as the t-SNE scatterplot for GRAM looks much better (dots are separated) compare to the other 2 methods.

On the other hand, since medical vector trained by GRAM is aligned well with the given knowledge DAG, which is made by human and might not be good. As you mentioned in the Med2Vec: "the degree of conformity of the code representations to the groupers does not neces-sarily indicate how well the code representations capture the hidden relationships"

I wonder how will you compare these 2 (or 3 if you count non-negative skip-gram) vector learning methods if given a large enough dataset?

Thanks! xianlong

mp2893 commented 7 years ago

Med2vec and GRAM are quite different actually. Med2vec is an unsupervised representation learning method. GRAM is typically used for improving the performance of supervised classifiers.

And you can actually combine med2vec and GRAM. In GRAM, you can achieve better performance if you pre-train the basic embeddings. In the paper, I trained those basic embeddings using GloVe, but you can use med2vec (you can use any representation learning technique actually).

But you ask an interesting question. I actually asked the very same question myself. How much can we rely on hand-engineered domain knowledge? The correct answer is, of course, if we have infinite data, we don't need any hand-engineered features. But in reality, you cannot always collect sufficient data for some medical codes (e.g. rare disease). Then the best you can do is rely on expert knowledge. Honestly, what else can we do?

When I said "the degree of conformity of the code representations to the groupers does not neces-sarily indicate how well the code representations capture the hidden relationships", I was assuming we had enough data. If we had enough data, then is it really a good idea to use the grouper as a evaluation metric? I was simply pointing this out.

Hope this helps, Ed

2g-XzenG commented 7 years ago

Hello Ed,

Thanks! It took me a while to understand this paper, your respond is very helpful!