jts / nanopolish

Signal-level algorithms for MinION data
MIT License
569 stars 159 forks source link

Calling cpggpc methylation with nanopolish #978

Open kir1to455 opened 2 years ago

kir1to455 commented 2 years ago

Hi, When I used nanopolish to repeat the nanoNOMe experiment, I found that my correlation was lower than nanoNOMe to compare with WGBS. Here is my nanopolish code: /nanopolish-cpggpc_new_train/nanopolish call-methylation -q cpggpc -t 48 -r run.fastq.gz -b run.bam -g ${ref}

Also, I want to know the parameter, the difference between CpG and GPC and cpggpc. In this example, I used cpggpc. Will call CpG alone be more accurate than call cpggpc together?

Thanks you for advance.

jts commented 2 years ago

@timp0 @isaclee any thoughts?

timp0 commented 2 years ago

So - we looked at this somewhat in our paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7704922/) and didn't see that it really changed the correlation.

Are you filtering out GpCpG sites?

kir1to455 commented 2 years ago

So - we looked at this somewhat in our paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7704922/) and didn't see that it really changed the correlation.

Are you filtering out GpCpG sites?

Yes,here is my pinelie.

mtsv

od1=/home/zyserver/HJL/ ls * | while read id do echo $id if [ ! -e ${odl}/${id}.bed.gz ]; then time /home/zyserver/HJL/mtsv2bedGraph.py -i /Methylation/$id -g /home/zyserver/HJL/hg38/hg38.fa --mod cpggpc --nome -c 1.5 | sort -k1,1 -k2,2n |bgzip > ${od1}/${id}.bed.gz & sleep 10s fi done

distinguish CG

od2=/home/zyserver/HJL/ ls *.bed.gz | while read id do echo $id if [ ! -e ${od2}/${id}.txt.gz ]; then zcat $id | awk '{if($8=="CG"){print$0}' > $id.txt.gz & fi done

distinguish GC

od2=/home/zyserver/HJL/ ls *.bed.gz | while read id do echo $id if [ ! -e ${od2}/${id}.txt.gz ]; then zcat $id | awk '{if($8=="GC"){print$0}' > $id.txt.gz & fi done

rename

!bash

rename 's/.bed.gz.txt.gz$/.tsv.bed.gz/' *

merge

cat *.bed.gz > all.bed.gz

parsemethlation_merge CG

/home/zyserver/HJL/nanopore-methylation-utilities/parseMethylbed.py frequency -i /home/zyserver/HJL/CG/raw/all.cpggpc_CG_methylation_calls.bed.gz --mod cpg -u 1.5 -l -1.5 | bgzip > ../frequency/all.cpggpc_CG_methylation_calls.txt.gz &

parsemethlation_merge GC

/home/zyserver/HJL/nanopore-methylation-utilities/parseMethylbed.py frequency -i /home/zyserver/HJL/GC/raw/all.cpggpc_GC_methylation_calls.bed.gz --mod cpg -u 1 -l -1 | bgzip > ../frequency/all.cpggpc_GC_methylation_calls.txt.gz &

Then I compare CpG methylation frequency to WGBS, but correlation is lower than yours picture. Also , I found that in your paper. image "For GpC methylation, the minimum window and number of sites were arbitrarily reduced to 100 bp and 10 nearby sites to account for more rapid fluctuations in the accessibility profile due to nucleosome positioning."

But when I use BSmmoth to smooth your GpC datas , I found some mistakes. image Is the parameters n = 10 and h = 100 and maxGap = 100000 the best? I can run this Bsmmoth with using the default parameters n= 70 and h = 1000 and maxGap = 10^8. But the width is too long.

Can you give som advice? Thanks for your help.