Closed sunta3iouxos closed 9 months ago
Can you check what the genome_size is in the organism yaml in your output folder?
e.g.
grep genome_size /scratch/tgeorgom/AP04/ChIP-seq_organism.yaml
According to https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001635.20/ , mm10 genome size is about 2.7Gb, which is similar to the number that's passed to MACS2 in the commandline you pasted above.
Can you check what the genome_size is in the organism yaml in your output folder?
e.g.
grep genome_size /scratch/tgeorgom/AP04/ChIP-seq_organism.yaml
According to https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001635.20/ , mm10 genome size is about 2.7Gb, which is similar to the number that's passed to MACS2 in the commandline you pasted above.
this is correct, but in the MACS2 site there is the following:
-g/--gsize
PLEASE assign this parameter to fit your needs!
It's the mappable genome size or effective genome size which is defined as the genome size which can be sequenced. Because of the repetitive features on the chromosomes, the actual mappable genome size will be smaller than the original size, about 90% or 70% of the genome size. The default hs -- 2.7e9 is recommended for human genome. Here are all precompiled parameters for effective genome size:
hs: 2.7e9 mm: 1.87e9 ce: 9e7 dm: 1.2e8
Users may want to use k-mer tools to simulate mapping of Xbps long reads to target genome, and to find the ideal effective genome size. However, usually by taking away the simple repeats and Ns from the total genome, one can get an approximate number of effective genome size. A slight difference in the number won't cause a big difference of peak calls, because this number is used to estimate a genome-wide noise level which is usually the least significant one compared with the local biases modeled by MACS.
Any idea?
Oh I see. We pass the same genome size to MACS2 as we would to deeptools commands. The deeptools recommendations for genome sizes state:
Genome | Effective size
-- | --
GRCh37 | 2864785220
GRCh38 | 2913022398
GRCm37 | 2620345972
GRCm38 | 2652783500
dm3 | 162367812
dm6 | 142573017
GRCz10 | 1369631918
WBcel235 | 100286401
TAIR10 | 119481543
I'm not sure why the drastic difference for the mouse genome between MACS2 and deeptools. I think it would be good to cross-check with MACS2 authors.
I'm not sure why the drastic difference for the mouse genome between MACS2 and deeptools. I think it would be good to cross-check with MACS2 authors.
I will. thank you for your comment. Your effective size is more reasonable as also described in the -gsize part.
Hello again,
This is my command:
and this is the command in the MACS2 log files:
The genome is set as mm10 that has an effective size of 1,6m reads and fere the -g is set to the human one