hasindu2008 / squigulator

a tool for simulating nanopore raw signal data
https://hasindu2008.github.io/squigulator
MIT License
58 stars 3 forks source link

Support for DNA methylation signal #8

Open xies4 opened 6 months ago

xies4 commented 6 months ago

Thank you for developing this amazing tool. I am wondering if there is a way to simulate reads with DNA methylation signals. If not, do you have any plan to include the function in the near future? Thanks again.

hasindu2008 commented 6 months ago

Hello,

This is something that is straightforward to implement given we already have methylation current level tables in f5c, yet, haven't found time to really implement it. Given that you asked now, I can get into implementing this sometime in the coming months.

xies4 commented 6 months ago

Great! Looking forward to it! Thanks.

HLHsieh commented 5 months ago

Hi, I am looking forward to it as well!

hasindu2008 commented 5 months ago

Implementing support for this new RNA004 kit to squigulator. Will do the methylation as soon as that is finished.

hasindu2008 commented 4 months ago

@HLHsieh and @xies4 Thanks for your patience. A very preliminary methylation simulation feature has been now implemented under the "meth" branch at https://github.com/hasindu2008/squigulator/tree/meth. Note that I have yet tested only using a tiny corona-virus genome and there could be bugs/issues. I am planning to do some testing in the next few weeks, however, if you are interested you can give it a try.

You need to create a 3-column tsv file to have the chr, 0-based position and methylation frequency. For example:

MN908947.3  99  1.0
MN908947.3  1184    0.0
MN908947.3  1456    0.5
MN908947.3  20031   0.7

Note that they must be tab separated. This example file is in test/methfreq.tsv

If you have a f5c/nanopolish methylation frequency TSV, you can use the following command to generate the necessary file for squigulator:

tail -n+2 methfreq.tsv | cut -f 1,2,7 > sqmeth.tsv

Then you can provide this file to squigulator like below:

./squigulator -x dna-r9-min -n 4000 -o a.blow5 test/nCoV-2019.reference.fasta --meth-freq test/methfreq.tsv

I tested this generated BLOW5 by basecalling (using buttery-eel) and running f5c:

eel -x cuda:all -i a.blow5 -o a.fastq --config dna_r9.4.1_450bps_sup.cfg
minimap2 -ax map-ont test/nCoV-2019.reference.fasta a.fastq  --secondary=no | samtools sort - -o a.bam && samtools index a.bam
f5c index --slow5 a.blow5 a.fastq
f5c call-methylation --slow5 a.blow5 -r a.fastq -b a.bam -g test/nCoV-2019.reference.fasta -o a.tsv
f5c meth-freq -i a.tsv -s -o afreq.tsv

The frequencies in afreq.tsv quite matched the ones in test/methfreq.tsv.

xies4 commented 4 months ago

Thank you @hasindu2008