ievapudz / TemStaPro

TemStaPro - a program for protein thermostability prediction using sequence representations from a protein language model.
MIT License
46 stars 9 forks source link

Balanced classifiers, and accuracy at higher temperatures #7

Open Clint-Holt opened 1 year ago

Clint-Holt commented 1 year ago

Hi great paper and easy to use code; thanks for making it available. I was hoping to use your models from the balanced training but can't seem to find the .pt files for balanced training. I only see them for "mean_major_imbal". I also can't find the training test sets to replicate these on Zenodo (though I don't need these if you upload the .pt files).

Also do you have any of the accuracy metrics for the higher temperature classifiers? (70 celcius, 75, etc.) I would like to use these, but would stick with 65 at the max if these don't have good accuracy/precision/recall.

Thanks!

ievapudz commented 1 year ago

Hello, thanks for the feedback about the work!

Regarding balanced training sets (TemStaPro-Major), we did not train our final classifiers with balanced sets, since we intended to include more data points in the training process.

I am not sure I understand what is meant by 'training test sets'. All data sets that were used to train, validate, and test the classifiers were uploaded to Zenodo system. If something still seems to be missing, please do let me know.

We do have accuracy metrics for the classifiers of upper thresholds - they were computed recently, next week the preprint in "bioRxiv" should be updated with the new scores.