Beginner Question: Where to start?

LJK1991 commented 1 year ago

Hello, Recently i obtained a Multiome (10X genomics snRNA/snATAC) dataset for my PhD and i would like to analyze it with Celloracle. In short, its an in vitro differentiatio of mESC cells towards Blood progenitors approximatly E7.0~E7.5 . Figure_A My goal is to attempt map the GRN with of the bifurcation of Nascent Mesoderm to Blood progenitors or Mesenchyme using CellOracle. Similarly to what you have done in Figure 4 of your paper

Sadly the sequencing of the snATAC was somewhat shallow, although the quality of reads is fine:

When i make a UMAP it looks lake a big clump (cant seperate cell populations clearly). I have tried Pando where i get few hits with low significance. The RNA data is good and i can map them on the Pijuan-Sala 2019 data nicely and we see the expected cell populations.

I Would like to use the ATAC data you generated to complement/replace my ATAC data so the data is rich enough for pipelines like Celloracle or Pando. However i am a trained Wetlab PhD slowly turning Bioinformatican and the work you generated is quite complex. I am at a loss where i should start and which steps i should take. I assume it is the archR Atac dataset where i would then have make metacell inference of my cell populations of interest (do i do this with joint data or ATAC alone), subsequently perform the in silico ChIP and make a GRN using Cell Oracle? Could you point me in the right direction?

Thank you in advance, and for the very cool study you have generated. Kind regards, Lucas

rargelaguet commented 1 year ago

Hi Lucas, this looks very exciting, thanks for sharing! I agree that the getting the full pipeline to work is not easy :)

If you want to run the in silico ChIP-seq, infer GRNs and apply CellOracle, etc. I strongly suggest you get a good metacell representation first. In our case used the scRNA-seq embedding for this, but you could use the ATAC, or the combined (from MOFA or WNN).

Once you have the metacell representation you are ready calculate the correlations between TF expr and peak accessibility that are required for in silico ChIP-seq and GRN inference.

P.S. Feel free to join our Slack channel if you want to share results and ask additional questions.

LJK1991 commented 1 year ago

Thank you for the quick reply. Just to confirm though.

I should create metacell information based on my own scRNA-seq data and then use the snATAC from you/10x-multiome to generate the TF expression and peak accessibility? I'm confused on how peak accessibility would then be correlated to the scRNA data; by cluster name/celltype or something else entirely? Additionally I performed basic QC with Seurat, using SCTransform, should i create metacells based on this or would you recommend following the Snakemake rna pipeline that was created by you?

Thanks in advance

P.S. I also asked the same question on Slack and thanks for the invite did not see before.

rargelaguet / mouse_organogenesis_10x_multiome_publication

Beginner Question: Where to start? #2