The results obtained with pyhealth are much lower than in the paper.

sunlabuiuc / PyHealth

A Deep Learning Python Toolkit for Healthcare Applications.

https://pyhealth.readthedocs.io

MIT License

994 stars 212 forks source link

The results obtained with pyhealth are much lower than in the paper. #209

Closed CodeNinjaja closed 1 year ago

CodeNinjaja commented 1 year ago

Thanks for your great job!

But I wonder why the results reported on the PyHealth homepage are much lower than those reported in the paper of SafeDrug. And according to the results reported by PyHealth, GAMENet performs better than Safedrug, which contradicts the paper's results.

Below are the results reported in the SafeDrug paper,

ycq091044 commented 1 year ago

Thanks, we are actively fixing this problem. Will reply to you soon.

CodeNinjaja commented 1 year ago

Thanks! Looking forward to your reply.

ycq091044 commented 1 year ago

Hi, these results shown on the webs are old results, which will be updated soon. Could you please run the scripts in example/and obtain your own new results? Here are the performance of GAMENet I obtained by running python examples/drug_recommendation_mimic3_gamenet.py.

{'pr_auc_samples': 0.7071216948197928, 'loss': 0.2621602493930947}

Note that the implementation of AI models or the data preprocessing pipeline might be a bit difference from what was reported in the papers. The reason is that different research paper processes the same data (such as MIMIC-III) differently. To make a fair comparison of all AI models in pyhealth, we standardize the processing pipeline for all models, and thus some model architecture might also have to be changed to fit the new data processing pipeline.

ycq091044 commented 1 year ago

feel free to re-open the issue if you have further questions.

Tyunsen commented 1 year ago

Hello ^ ^, thank you very much for the PyHealth framework's contribution to drug recommendation.

I have a similar situation, that is, the jaccard_samples of my locally run GAMENet model can reach about 0.43, and the pr_auc_samples of my SafeDrug model can also reach about 0.65.

Here is the problem: the jaccard_samples of my local SafeDrug can only reach about 0.33. Theoretically, the jaccard_samples of SafeDrug should be similar to GAMENet. Why is there such a big gap?

Here are the performance of SafeDrug I obtained by running python examples/drug_recommendation_mimic3_safedrug. 2023-09-17 11:10:52 --- Eval epoch-16, step-6035 --- 2023-09-17 11:10:52 jaccard_samples: 0.3246 2023-09-17 11:10:52 loss: 0.3497 2023-09-17 11:10:52 New best jaccard_samples score (0.3246) at epoch-16, step-6035

Looking forward to and thank you for your reply! ^^

ycq091044 commented 1 year ago

Hello @Tyunsen , I find the SafeDrug adaptive loss has issues and already solved it. Please install the new pyhealth version by "git clone" and "pip install ." Thanks!