tttianhao / CLEAN

CLEAN: a contrastive learning model for high-quality functional prediction of proteins
MIT License
217 stars 41 forks source link

training & validation details #28

Closed artcmd closed 1 year ago

artcmd commented 1 year ago

Hello! How did you train the model with SupCon-Hard loss? The number of epoch numbers was suggested in the readme. How did you get these numbers? I didn't seem to find your setup for validation. Thanks!

canallee commented 1 year ago

Hi, the validation is done for development models (SupCon-Hard CLEAN models with access to 5-fold train/validation data). You can do the standard cross-validation fold splits for yourself, or email us for the original data splits. The validation evaluation steps run the two EC-calling methods and compare the predicted EC numbers with the ground truth.

We got the Epoch numbers base on how the model performs after k epochs. Because our computing resources are limited, those numbers should be taken as reference only (it's very likely that if you run more epochs you can have some slight improvement over the metrics reported in our paper). Please note that the SupCon-Hard loss receives over a dozen examples whereas the Triplet loss receives just three examples for one forward pass, it is much slower per epoch compared to the Triplet loss. In fact, we only trained SupCon-Hard CLEAN models with a 70% split, but a 100% split version should be doable if you have an A100 GPU for several days.