Closed vinettey closed 8 months ago
Are you using the most up to data version of the repo? Were you able to walk through the notebook on embedding new species?
Yes! I was able to walk through the notebook and generate the files.
What is the command to launch UCE that you are using? Could you also please double check that this on the most recent version of the repo? Thanks!
This is the command to launch UCE.
Can you upload a screenshot of the full error you get when you try to run that? Thanks!
This is the error message output by running UCE.
Are you sure you are using the most recent version of the repo? Did you modify the model files at all?
In the current code we added:
empty_pe = torch.zeros(145469, 5120)
empty_pe.requires_grad = False
model.pe_embedding = nn.Embedding.from_pretrained(empty_pe)
model.load_state_dict(torch.load(args.model_loc, map_location="cpu"),
strict=True)
So I'm not sure how there can be a mismatch to the model there. Maybe try redownloading the model?
Hi! Updating to the latest github solves the problem of the dimension mismatch. But it gives another error on embedding generated for a new species.
Please see the response here: https://github.com/snap-stanford/UCE/issues/18#issuecomment-1910796722
This error happens when there is a cell with 0 genes expressed.
Hi! I checked my gene x cell matrix and there's no cells with 0 genes. Could you share the exact code of generating ESM protein embeddings for a new species? I want to make sure there's no mismatch in gene names. Thanks!
If you look at the UCE output in terminal when processing the dataset, it will output the number of genes matched.
You can also call
torch.load("path to the protein embedding dataset")
to load in the protein embedding dataset which will list the gene names that are filtered.
Thanks for the suggestion! I found the cause for this problem. The wrong adata file was used in the first place (X contains scaled data not count) and the intermediate file was not updated after correcting the adata file. Thank you again for the help!
Hi! I encountered a mismatch error when running UCE on ESM embedding of a new species. RuntimeError: Error(s) in loading state_dict for TransformerModel: size mismatch for pe_embedding.weight: copying a param with shape torch.Size([145469, 5120]) from checkpoint, the shape in current model is torch.Size([19910, 5120]).
I generated protein embeddings with the ESM2 model by the following codes:
Could you please help checking what might went wrong here? Thanks!