When building BigWigs, I would like to use the normalize to 1x feature in deeptools. This feature needs to have an estimate of the effective genome size for dm6. Their estimate for dm3 is 121,400,000. I expect that dm6 is slightly larger than this. To estimate effective genome size Deeptools has a few suggestions including:
Use bamCoverage
If you have a sample where you expect the genome to be covered completely, e.g. from genome sequencing, a very trivial solution is to use bamCoverage with a bin size of 1 bp and the --outFileFormat option set to ‘bedgraph’. You can then count the number of non-Zero bins (bases) which will indicate the mappable genome size for this specific sample.
I have identified a set of samples that are WGS. By aligning and merging these samples I hope to get a reasonable estimate of effective genome size for dm6.
Questions and Tasks
[X] What is the effective genome size?
The effective genome size ~129,000,000
Definition of done
[X] Have a number that is near the dm3 estimate.
The dm3 estimate was ~121,000,000 which is on the same order of magnitude as the dm6 estimate.
Summary
I ended up randomly selecting 30 WGS samples for this estimate. I had started using all WGS samples, but this was taking too long and I felt would not add additional precision. One thing to remember is that effective genome size is dependent on which aligner is used and the read length. The samples selected have a range of read lengths from 35bp to 150bp and I used similar setting as in the alignment workflow. For the effective genome size estimate I excluded scaffolds because I felt they just make the problem a little more complicated. For the normalization in the workflow I am excluded chrX and because they could because they could be variable between sexes.
Story
When building BigWigs, I would like to use the normalize to 1x feature in deeptools. This feature needs to have an estimate of the effective genome size for dm6. Their estimate for dm3 is
121,400,000
. I expect that dm6 is slightly larger than this. To estimate effective genome size Deeptools has a few suggestions including:I have identified a set of samples that are WGS. By aligning and merging these samples I hope to get a reasonable estimate of effective genome size for dm6.
Questions and Tasks
Definition of done
Summary
I ended up randomly selecting 30 WGS samples for this estimate. I had started using all WGS samples, but this was taking too long and I felt would not add additional precision. One thing to remember is that effective genome size is dependent on which aligner is used and the read length. The samples selected have a range of read lengths from 35bp to 150bp and I used similar setting as in the alignment workflow. For the effective genome size estimate I excluded scaffolds because I felt they just make the problem a little more complicated. For the normalization in the workflow I am excluded chrX and because they could because they could be variable between sexes.