Closed simpsonbp closed 2 years ago
Could you elaborate on exactly how the two outputs do not correlate? Note that ambiguous calls (by default calls between 20% and 80% confidence) are filtered in the aggregation step, so some calls from the bam files will not be accounted for in the bed file.
Sorry for the confusion--correlate to mean the two outputs do not match at each position.
I'm using the .bam in IGV to present the overall 5mC methylation broadly and the .bed for the % methylation quantification at each position.
For example: At position 175 in the bam in IGV there are 91 total counts: 40 are unmethylated and 13 are methylated. At position 175 in the bed there are 15 counts (column 10) and 0 methylation (column 11).
So by default the bed file is more accurate? Is there a way to change the default settings so the bam more closely matches the bed?
The IGV visualization of bam methylation is not necessarily the most accurate representation of methylation. I believe IGV has a threshold over which a site is shown as methylated, but I believe the default is ~20% probability. This means that a site with 20% probability of being methylated will be shown as methylated. The better use for this bam files is to consider the shading of each call (representing the confidence in each call).
For the bed file ambiguous calls (calls < 80% confident by default) are ignored. This results in a better estimate of the ground truth fraction modified at each reference site and generally a relatively low proportion of filtered sites (<15-20%). The megalodon option --mod-binary-threshold
can be set to 0
to force all reads at a site to contribute to the fraction modified estimate. This will allow the bam and bed to agree more closely, but likely be less representative of the ground truth.
Very helpful! Thanks!
The
modified_bases.5mC.bed
does not correlate at each base position with themod_mappings.5mC.bam
in IGV bisulfite mode with the commands below:ont-guppy/bin/guppy_basecaller --device cuda:0 --gpu_runners_per_device 4 --input_path fast5_all --save_path output_folder/basecall_5mC_5hmC_3294 --fast5_out --post_out --config res_dna_r941_min_modbases_5mC_5hmC_v001.cfg
megalodon output_folder/basecall_5mC_3294/workspace --do-not-use-guppy-server --overwrite --outputs basecalls mappings mod_mappings mods --mod-motif Z CG 0 --mod-map-emulate-bisulfite --mod-map-base-conv C T --mod-map-base-conv Z C --mappings-format bam --reference mm38_72Q_166kb_px551_gRNA4+5_refig2.fasta --allow-supplementary-alignments --output-directory megalodon_results_supplementary --disable-mod-calibration --devices cuda:all:100% --processes 20
The guppy basecall server did not work on the backend with guppy 6.0.1 so I basecalled before running megalodon.
Why the differences between the two outputs?