vahidAK / NanoMethPhase

Methylation Phasing for Nanopore Sequencing
GNU General Public License v3.0
44 stars 3 forks source link

hydroxymethylation results not right #24

Closed weishwu closed 4 months ago

weishwu commented 5 months ago

I tried using NanoMethPhase to detect allele-specific hydroxymethylation but the results do not make sense. As shown below, apparently there is difference between the two haplotypes, but the diff values were all reduced to almost zero in the NanoMethPhase output (in the .DMLtest.txt file).

Screenshot 2024-06-13 at 4 53 07 PM

My workflow:

# Dorado basecalling:
Dorado basecaller \
     dna_r10.4.1_e8.2_400bps_sup@v4.3.0 \
     --modified-bases-models dna_r10.4.1_e8.2_400bps_sup@v4.3.0_5mCG_5hmCG@v1 \
     --min-qscore 10 \
     ${pod5_dir}/ \
     > Sample_${sample}.doradoSup.basecall.bam

# read align
nextflow run epi2me-labs/wf-alignment -revision prerelease \
--bam Sample_${sample}.doradoSup.basecall.bam \
--references hg38.fa \
--sample ${sample} \
--out_dir ${sample} \
--minimap_preset dna \
-profile singularity \
-c ${nf_conf} \
--threads 20 

# extract methylation
modkit extract \
   --cpg \
   --mapped-only \
   --reference hg38.fa \
   {sample}.sorted.aligned.bam \
   Sample_{sample}.dorado_aligned.modkit_extract.CpG.tsv

# prepare CpG meth calls for NanoMethPhase (in another post you told me to process the modkit output in this way. For hydroxymethylation I just changed "m" to "h" in the awk line)

echo -e "chrom\tpos\tstrand\tpos_in_strand\treadname\tread_strand\tprob_0\tprob_1\tcalled_label\tk_mer" >{sample}.reformatted.tsv

awk '{{OFS="\t";if (($12=="h")) {{print $4,$3,$6,"NA",$1,"NA",1-$11,$11,"NA",$14}}}}' Sample_{sample}.dorado_aligned.modkit_extract.CpG.tsv >> {sample}.reformatted.tsv

python nanomethphase.py methyl_call_processor \
   --tool_and_callthresh deepsignal:0.6 \
   -mc {sample}.reformatted.tsv \
   | sort -k1,1 -k2,2n -k3,3n | bgzip > Sample_{sample}.methylation_calls.bed.gz

tabix -p bed Sample_{sample}.methylation_calls.bed.gz

# phase
python nanomethphase.py phase \
   --overwrite \
   --include_indels \
   -b {input.aln} \
   -v {input.phased.vcf} \
   -mc Sample_{sample}.methylation_calls.bed.gz \
   -r {params.genome_fasta} \
   -o {params.out_prefix} \
   -of methylcall 

# dma
python nanomethphase.py dma \
   -c 1,2,4,5,7 \
   -ca Sample_{sample}.GQ20_NanoMethPhase_HP1_MethylFrequency.tsv \
   -co Sample_{sample}.GQ20_NanoMethPhase_HP2_MethylFrequency.tsv \
   -o {params.out_dir} \
   -op {params.prefix} \
   --overwrite \
   --pval_cutoff 0.01 \
   --smoothing_span 1000

Thanks.

weishwu commented 5 months ago

Ignore the IGV tracks. They are probably contaminated by 5mC. I'm regenerating them.

vahidAK commented 4 months ago

Hi @weishwu. Is this resolved?

weishwu commented 4 months ago

Hi @vahidAK After correcting the data, the IGV tracks show low hydroxymethylation signals and almost no allelic differences, which is consistent with the NanoMethPhase results. Unless you see anything wrong in the workflow, we can close this. Thanks.