Closed f6v closed 3 years ago
Hi f6v,
Thank you for your question! The answer depends on the aim of your analysis.
(1) If you aim to compare GRN analysis results, you do not need to translate the data from mm9 to mm10.
The genome coordinate information was required for the motif scan, but it is not used in subsequent GRN analysis; The GRN analysis just requires paired lists of TF and its target genes.
As you pointed out, the cordinates of TFinfo_df
are not taken into account in most GRN analysis.
Although the gene location information is still stored in the oracle object, You can just ignore this information when you do a GRN analysis. The peak cordinates of TFinfo_df is used for a different analysis.
(2) If you aim to compare TF binding genomic locations, you need to translate the data or you need to start ATAC-seq analysis from scratch.
In this case, I recommend to do the analysis from scratch. For example, please get fastq files of mouse ATAC Atlas data and align NGS reads into the same reference genome as you used for your ATAC-seq data. I think the TFinfo data need to be prepared in the same manner. I guess it is better to process the atac-seq data in the same pipeline if you want to compare ATAC-seq peak location or TF binding patterns.
@KenjiKamimoto-wustl122 thanks! I was meaning to compare GRN results, but both suggestions are insightful!
I'd like to compare the results I get from CellOracle with my ATAC data and with mouse ATAC atlas. Since my data is mapped to mm10 I'd like to run
liftover
on the atlas data. However, is it correct to take the output fromco.data.load_TFinfo_df_mm9_mouse_atac_atlas()
and translate the genomic coordinates, or should I start from all the way from the normalised counts?EDIT: I think the coordinates of
TFinfo_df
aren't taken into account whenoracle.import_TF_data
is used, are they? Does this idea of translating mm9 to mm10 make sense to you at all?Thanks!