kbressem / medAlpaca

LLM finetuned for medical question answering
GNU General Public License v3.0
474 stars 54 forks source link

How to compute scores after USMLE evaluation? #40

Closed anonymoususerr1 closed 1 year ago

anonymoususerr1 commented 1 year ago

Hi, can you share the script to compute the scores after evaluating each model on USMLE data via eval_usmle.py?

kbressem commented 1 year ago

I used this notebook. JSON files were created with eval_usmle.py.

eval_usmle.ipynb.zip

anonymoususerr1 commented 1 year ago

Using the notebook you shared, I always get nan as the scores for every model including your pretrained models such as medalpaca-lora-7b-8bit. Do you have any idea why this happens?

kbressem commented 1 year ago

Does the model fail on some scores and produces nan? I believe if one nan is in the list, numpy will return nan for all.