Closed pgmikhael closed 9 months ago
Hi,
As an update to the above, I trained a model on split50 using the triplet loss and obtained reasonable performance (chose the triplet loss since it's faster). This seems to indicate that the trained model weights may be off. Please let me know.
Hi Peter, thanks for pointing out. We realized that we made a mistake in the splits csv files during previous commit. We have uploaded the original csv files for now. The issue should be resolved.
Hi,
Thanks so much for this work, and for making the repo super nice and straightforward!
Before evaluating the model on a separate use case I have, I wanted to make sure I didn't do anything wrong when setting up the project, so I've been trying to evaluate the model per the README to ensure I get consistent results. However, I obtain very poor performance when calling
inference.py
on the test sets provided (price, new), so it would be great to understand what I've done wrong.From my attempts to debug, this doesn't seem to be an issue of models/data processing. For instance, I compared the embedding of the first cluster (EC 2.7.10.2) from
data/pretrained/100.pt
with ones I manually recomputed, and obtained the same values (up to some numerical error). Specifically, I made a FASTA file using the sequences that are in EC 2.7.10.2, extracted their embeddings, then passed them through the pretrained model (data/pretrained/split100.pth
). I compared these with what we get from callingget_cluster_center
on the precomputed tensor. These appeared to be consistent. So, if the embeddings are calculated in a consistent manner, I'm not sure why the predictions are turning out to be wrong.Would greatly appreciate any help wherever you think I made a mistake.
Thank you!!