Closed jbsyngenes closed 3 months ago
Hi, Thanks for your interest. The issue you are describing are due to the nature of Random Forest regressor which caps the prediction by the maximum value in the input training data. What you need to do is to simply take the top 10-16 of the untested variants in this list and go from there. I will update the code to directly output the non-tested variants in later versions. Hope this helps.
The toplayer.py has been updated to only output the variants to test now.
round1_all_new.csv
Hi, I ran the model with the fasta and input excel file that you provided, and it worked! however, in the output file that I got, it looks like 7/10 top performers were mutants that were included in the sample dataset (attached the output I got for reference). Was this actual functional data? What should be done in a situation like this where the model doesn't suggest any better performers? I was expecting to see lots of new mutants with higher performance at the top of the list. Thanks!