Open maximegmd opened 1 year ago
emmmm, me too.
Hey Doc! The evaluation function i used is in the .ipynb attached in the repository. I created a semantic similarity threshold for all responses congruent with possible responses in the USMLE. So it doesn't have to be a verbatim response, thus the accuracy was higher. Also, i am about to release a new fine-tuned model next week. the goal here is to keep on improving. i just merged my first PR. posted a paid bounty last week for UI issues. would love your help!
❓ General Questions
I evaluated the model using lm-evaluation-harness on MedMCQA, MedQA-USMLE and PubMedQA and the model performs barely above llama2 7b with only 38% on the USMLE, 36% on MedMCQa and 73.9% on PubMedQA.
Could you describe how you got your results?