tttianhao / CLEAN

CLEAN: a contrastive learning model for high-quality functional prediction of proteins
MIT License
217 stars 41 forks source link

How to filter the prediction results based on the pairwise distance between cluster center of EC number? #33

Closed xylapple2013 closed 1 year ago

xylapple2013 commented 1 year ago

How should I filter the prediction results based on the pairwise distance between the cluster center of a EC number? what is the cutoff value?

canallee commented 1 year ago

Hi, please refer to the infer_maxsep() and infer_pvalue(). infer_maxsep() is deterministic, and will calculate a cutoff value automatically and predict the enzyme functions, infer_pvalue() is tunable, and you can allow more predicted functions/enzyme by setting a larger p-value.

ZhuLvs commented 1 week ago

"I now need to execute the inference step, and my sequence is a randomly generated protein. I don't have any information related to EC. Should I just follow the quick-start method for inference? Do I also need to manually perform the embedding generation step and consider the p-value and max-separation options afterward? Thank you for your response."