kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
161 stars 81 forks source link

mm10 vs Grcm38(Gencode) #143

Open pdellorusso opened 5 years ago

pdellorusso commented 5 years ago

This is a question, not an issue, but I am curious about whether there is a specific reason to use mm10 over the GRCm38 (https://www.gencodegenes.org/mouse/) primary assembly available from Gencode?

Is this the standard mouse genome assembly to use for all Encode standardized pipelines?

strattan commented 5 years ago

@pdellorusso Thanks for your question. The GRCm38 build ENCODE uses is based on what GRC calls the "latest major release", which is at the "GRCm38" tab here: https://www.ncbi.nlm.nih.gov/grc/mouse

We do not apply the periodic patches GRC applies, which is up to p6 at this time.

The mm10 ENCODE uses for mapping has chromosome names in "UCSC format" (like "chr1"), and includes autosomes, both sex chromosomes, M, and the unplaced and unlocalized scaffolds. Downstream analysis may choose to use any subset of those mappings but the mapping is always to the same reference.

For transcript annotations, we have used GENCODE M4 https://www.gencodegenes.org/mouse/release_M4.html. We anticipate upgrading to a more recent GENCODE build this year, but the ENCODE RNA working group have not decided on exactly which build or what that timeline is. When we do decide, we will make an announcement on https://www.encodeproject.org/

I hope that's helpful!

XiaoYan000 commented 1 day ago

Hi, I am struggling with annotating by Gencode M25. I have used annotate_variation function and set parameters according to "Create your own gene definition databases for non-human species". After I acquired variant_function, and exonic_variant_function files, I am wondering how to make an output like that table_annotate give, so that I can input them into the maftools for downstream analysis. I am looking forward to your reply. Thank you very much!