haesleinhuepf / human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation
MIT License
20 stars 13 forks source link

Different plot for comparing models #140

Open haesleinhuepf opened 5 days ago

haesleinhuepf commented 5 days ago

After some online feedback, I conclude it might make sense to change the box-plot that summarizes model performance. Maybe a violin plot would be better suited?

nscherf commented 5 days ago

Sounds good. We could also go for a jittered point plot / stripplot as in https://seaborn.pydata.org/generated/seaborn.stripplot.html