Closed bschilder closed 1 year ago
Hi Brian, please see this section of the readme: https://github.com/snap-stanford/SATURN#data-availability
We made protein embeddings available for all species analyzed in the paper here: http://snap.stanford.edu/saturn/data/
Thanks @Yanay1 , and apologies for having missed that earlier. The protein_embeddings.tar.gz
does indeed contain the majority of the species I mentioned.
That said, there still seems to be a couple missing. Of those, I'm particularly interested in "rat" and "fly". I don't suppose you'd be able to share those as well?
'bat': FZ_EMBEDDING_DIR / 'Rhinolophus_ferrumequinum.mRhiFer1_v1.gene_symbol_to_embedding_ESM1b.pt',
"sea_squirt": FZ_EMBEDDING_DIR / 'Ciona_intestinalis.KH.gene_symbol_to_embedding_ESM1b.pt',
"chicken": FZ_EMBEDDING_DIR / 'Gallus_gallus.GRCg6a.gene_symbol_to_embedding_ESM1b.pt',
"fly": FZ_EMBEDDING_DIR / 'Drosophila_melanogaster.BDGP6.32.gene_symbol_to_embedding_ESM1b.pt',
"rat": FZ_EMBEDDING_DIR / 'Rattus_norvegicus.mRatBN7.2.gene_symbol_to_embedding_ESM1b.pt',
"tree_shrew": FZ_EMBEDDING_DIR / 'Tupaia_belangeri.TREESHREW.gene_symbol_to_embedding_ESM1b.pt'
Shared via email!
Perfect, thanks @Yanay1 !
I'm struggling a bit with generating the embedding for Callithrix jacchus, and I was wondering if precomputed embedding exists for this species? Also, can you explain how to use .torch files? The frog_zebrafish_embryogenesis vignette starts with a .pt file.
Hi,
I can generate the Marmoset protein embeddings for you! It should be done in the next few days.
Torch files store torch tensors, in this case, we use them to store the protein embeddings. They can be read in using
torch.load(...path)
Wow, thank you so much! I really appreciate your help.
Best, Dana McCormack Senior Research Support Associate Feng Lab, MIT Pronouns: they/them
From: Yanay1 @.> Sent: Thursday, August 3, 2023 9:26 PM To: snap-stanford/SATURN @.> Cc: Dana McCormack @.>; Comment @.> Subject: Re: [snap-stanford/SATURN] Precomputed embeddings (Issue #19)
Hi,
I can generate the Marmoset protein embeddings for you! It should be done in the next few days.
Torch files store torch tensors, in this case, we use them to store the protein embeddings. They can be read in using
torch.load(...path)
https://pytorch.org/docs/stable/generated/torch.load.html
— Reply to this email directly, view it on GitHubhttps://github.com/snap-stanford/SATURN/issues/19#issuecomment-1664843584, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3K6QAMVMNOHKQS6PWS4L2LXTRFTPANCNFSM6AAAAAAYG7FLOY. You are receiving this because you commented.Message ID: @.***>
ESM1b Embeddings: https://drive.google.com/file/d/19qsKrFO153EId4uP7Ip4YYtLbEDHUoKF/view?usp=drive_link
ESM2 Embeddings: https://drive.google.com/file/d/1ItTBC27WQ968gkGVwdjzgMMrt2mHdTsb/view?usp=sharing
The proteome comes from here: https://useast.ensembl.org/Callithrix_jacchus/Info/Index
You should download the whole thing, not sure why google drive shows it as a folder-- it's one file.
Thank you so much!!!
Best, Dana McCormack Senior Research Support Associate Feng Lab, MIT Pronouns: they/them
From: Yanay1 @.> Sent: Friday, August 4, 2023 3:22 PM To: snap-stanford/SATURN @.> Cc: Dana McCormack @.>; Comment @.> Subject: Re: [snap-stanford/SATURN] Precomputed embeddings (Issue #19)
ESM1b Embeddings: https://drive.google.com/file/d/19qsKrFO153EId4uP7Ip4YYtLbEDHUoKF/view?usp=drive_link
ESM2 Embeddings: https://drive.google.com/file/d/1ItTBC27WQ968gkGVwdjzgMMrt2mHdTsb/view?usp=sharing
The proteome comes from here: https://useast.ensembl.org/Callithrix_jacchus/Info/Index
— Reply to this email directly, view it on GitHubhttps://github.com/snap-stanford/SATURN/issues/19#issuecomment-1666070727, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3K6QANCH72LKH4KKWCFSILXTVDX3ANCNFSM6AAAAAAYG7FLOY. You are receiving this because you commented.Message ID: @.***>
Hi,
I have some follow up questions about using SATURN:
Once again, thank you so much for your help with generating the .pt files and any insight you can offer! SATURN is such a cool tool and I am enjoying learning how to work with it.
Best, Dana McCormack Senior Research Support Associate Feng Lab, MIT Pronouns: they/them
From: Dana McCormack @.> Sent: Friday, August 4, 2023 3:29 PM To: snap-stanford/SATURN @.> Cc: Margaret Elizabeth Schroeder @.***> Subject: Re: [snap-stanford/SATURN] Precomputed embeddings (Issue #19)
Thank you so much!!!
Best, Dana McCormack Senior Research Support Associate Feng Lab, MIT Pronouns: they/them
From: Yanay1 @.> Sent: Friday, August 4, 2023 3:22 PM To: snap-stanford/SATURN @.> Cc: Dana McCormack @.>; Comment @.> Subject: Re: [snap-stanford/SATURN] Precomputed embeddings (Issue #19)
ESM1b Embeddings: https://drive.google.com/file/d/19qsKrFO153EId4uP7Ip4YYtLbEDHUoKF/view?usp=drive_link
ESM2 Embeddings: https://drive.google.com/file/d/1ItTBC27WQ968gkGVwdjzgMMrt2mHdTsb/view?usp=sharing
The proteome comes from here: https://useast.ensembl.org/Callithrix_jacchus/Info/Index
— Reply to this email directly, view it on GitHubhttps://github.com/snap-stanford/SATURN/issues/19#issuecomment-1666070727, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3K6QANCH72LKH4KKWCFSILXTVDX3ANCNFSM6AAAAAAYG7FLOY. You are receiving this because you commented.Message ID: @.***>
centroid_score_func
that gives a few more options to choose from, but we didn't find a big difference between them. You can modify the code to use any initialization function, including one that maps specific numbers of genes to each macrogene, if you'd like.For users of scVI or other single cell VAE based methods I don't believe it is standard practice to run the model many times, but this might be an interesting question to ask their team. I think in our case, such as in figure 4B, it can be useful to run the model multiple times to build confidence in something like a reannotation, but it's maybe not wholly necessary.
Sorry for the slow response, a thank you email was sitting in my drafts and I completely forgot about it! Your email was very helpful.
I have another question about the package that I'm hoping you can help with. I'm trying to transfer the SATURN UMAP embedding to an anndata object with the original genes. The goal is to be able to plot individual genes to show species differences but with the superior SATURN integration.
For some reason, the indices seem to be getting mixed up along the way (ie the cell types no longer being separated in the UMAP space). I've transferred embedding like this before from an anndata object with a subset of genes to the full anndata object, but maybe there's something different when you convert from the macrogene space?
I re-indexed the datasets prior to transferring the embedding like this: (where saturn_adata is the anndata object generated by saturn and has macrogenes for var, and adata is the combined anndata object from the same files submitted to saturn) barcodes = list(adata.obs.index) saturn_adata = saturn_adata[barcodes, :].copy()
and confirmed the re-indexing was successful from the below code returning True: adata_barcodes = list(adata.obs.index) saturn_adata_barcodes = list(saturn_adata.obs.index) adata_barcodes == saturn_adata_barcodes
I tried transferring the embedding in a few different ways:
Even weirder, when I tried to plot a few marker genes to troubleshoot, the expression pattern appeared null. If all of the cell types were mixed, then I should be seeing a mix of positive and null expression.
[cid:c860cecf-69db-4b85-beba-d6382ad92e6d] Do you have any ideas about what could be happening or recommendations for a different method? I'm surprised that this isn't transferring easily given that UMAP is a 2D representation that is disconnected from the macrogene space.
Best, Dana McCormack Senior Research Support Associate Feng Lab, MIT Pronouns: they/them
From: Yanay1 @.> Sent: Thursday, August 10, 2023 2:55 AM To: snap-stanford/SATURN @.> Cc: Dana McCormack @.>; Comment @.> Subject: Re: [snap-stanford/SATURN] Precomputed embeddings (Issue #19)
For users of scVI or other single cell VAE based methods I don't believe it is standard practice to run the model many times, but this might be an interesting question to ask their team. I think in our case, such as in figure 4B, it can be useful to run the model multiple times to build confidence in something like a reannotation, but it's maybe not wholly necessary.
— Reply to this email directly, view it on GitHubhttps://github.com/snap-stanford/SATURN/issues/19#issuecomment-1672661818, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3K6QANNI3R7JQYIA6DQKSLXUSAVVANCNFSM6AAAAAAYG7FLOY. You are receiving this because you commented.Message ID: @.***>
Would it be possible to upload a picture of the umaps/the full code snippet? (Can also email them)
Whoops, I didn't realize that this conversation was on the github issues! The responses went to my email so I assumed it was there. What email address is best?
yanay (at) stanford.edu
From: dana-mcc @.> Sent: Monday, September 11, 2023 4:21 PM To: snap-stanford/SATURN @.> Cc: Yanay Rosen @.>; Mention @.> Subject: Re: [snap-stanford/SATURN] Precomputed embeddings (Issue #19)
Whoops, I didn't realize that this conversation was on the github issues! The responses went to my email so I assumed it was there. What email address is best?
— Reply to this email directly, view it on GitHubhttps://github.com/snap-stanford/SATURN/issues/19#issuecomment-1714730788, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACOIF2VYCCD7F5IUHMZK5FTXZ6MGDANCNFSM6AAAAAAYG7FLOY. You are receiving this because you were mentioned.Message ID: @.***>
Hello,
Thanks again for the awesome framework!
Our lab is working on a project that involves cross-species comparisons, and to start I'm looking to use SATURN to identify homologous cell type mappings across datasets/species.
I noticed in the following file is seems your team has already precomputed gene embeddings for quite a few of a species that I'm investigating. I have already begun regenerating these embeddings, but a you know this can take quite a while (especially across many species).
Would your team possibly be willing to shared these embeddings so that users like myself can skip the embedding pre-steps and go right to training SATURN? Figshare is great, but any storage platform would be welcome. https://github.com/snap-stanford/SATURN/blob/f0813fb5300a3ada69415dc9141c59e8bc4e5cb6/data/gene_embeddings.py#L14
While sharing all of them would be welcome, the ones that are highest priority for me are:
Thanks again! Brian