Closed vinettey closed 9 months ago
Yes that is correct! UCE uses the 15B ESM2 model.
There are quite a few pre-calculated ESM2 embeddings here: https://drive.google.com/drive/folders/1_Dz7HS5N3GoOAG6MdhsXWY1nwLoN13DJ which might contain the species you are interested in.
Yes that is correct! UCE uses the 15B ESM2 model.
There are quite a few pre-calculated ESM2 embeddings here: https://drive.google.com/drive/folders/1_Dz7HS5N3GoOAG6MdhsXWY1nwLoN13DJ which might contain the species you are interested in.
Thank you for providing this great tool and for the detailed information shared in this issue. I have a follow-up question regarding the ESM2 models:
Do I need to use the largest ESM2 model (esm2_t48_15B_UR50D) for convert_protein_embeddings_to_gene_embeddings.py
? I am asking because during a test run with the smaller model (esm2_t33_650M_UR50D), I encountered a KeyError: 48 in the script.
Is there something specific to the smaller model that might be causing this issue?
Thanks in advance for your help!
Hi, I’m trying to run the saturn code to generate protein embeddings that I’ll use for input into UCE. I noticed that the saturn code uses a smaller model (esm1b_t33_650M_UR50S). I just wanted to confirm that you’re generating UCE protein embeddings using the biggest ESM2 model (https://huggingface.co/facebook/esm2_t48_15B_UR50D). I just replaced the line in the saturn code that specified the model with the 15B parameter ESM2 version. Is that correct Thank you for the extra information!