philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
284 stars 41 forks source link

Reporting >30kb deletion in ONT data #68

Closed vhfsantos closed 4 years ago

vhfsantos commented 4 years ago

Dear Rescheneder,

I have been using NGMLR+Sniffles to confirm a known large (~30kb) deletion event, but this deletion is not being reported in my vcf file. I've tried some combinations of parameters on both NGMLR and Sniffles, but I couldn't find a solution for it.

I have made a minimal reproducible example in order to make the issue reproduction easier. The corresponding files are available here.

The fake_genome.fa file contains a random ~100kb sequence (generated here), and the fake_read_with_deletion.fa file contains the same sequence, but with a ~30kb deletion, leaving a ~55kb sequence upstream and ~15kb sequence downstream the deletion.

After running NGMLR and Sniffles, I was expecting to get something like is shown on Sniffles paper:

image12163

But this was what I've got:

path12147

and no SV have been reported in vcf file.

Do you know what I can do to get this deletion reported?

Thanks!

mschatz commented 4 years ago

Since you only have simulated a single read, you need to tell sniffles to report variants found in a single read using the sniffles -s 1 parameter:

$ ngmlr -r fake_genome.fa -q fake_read_with_deletion.fa -o reads.sam $ samtools view -b reads.sam -o reads.bam $ samtools index reads.bam $ sniffles -s 1 -m reads.bam -v reads.vcf $ cat reads.vcf | grep -v '^#' random 54000 0 N . PASS PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=random;END=88081;STD_quant_start=0.000000;STD_quant_stop=0.000000;Kurtosis_quant_start=nan;Kurtosis_quant_stop=nan;SVTYPE=DEL;SUPTYPE=SR;SVLEN=-34081;STRANDS=+-;RE=1 GT:DR:DV ./.:.:1

This is perhaps okay for simulation, but in real data I would never trust a variant because of just a single read since every platform suffers from low level chimeric fragments and other artifacts. We generally recommend at least 30x coverage, and then report variants observed in at least 10 reads.

Good luck

Mike

On Thu, Sep 19, 2019 at 4:34 PM Vinícius Henrique Franceschini dos Santos < notifications@github.com> wrote:

Dear Rescheneder,

I have been using NGMLR+Sniffles to confirm a known large (~30kb) deletion event, but this deletion is not being reported in my vcf file. I've tried some combinations of parameters on both NGMLR and Sniffles, but I couldn't find a solution for it.

I have made a minimal reproducible example in order to make the issue reproduction easier. The corresponding files are available here https://github.com/philres/ngmlr/files/3632938/minimal_reproducible_example.tar.gz .

The fake_genome.fa file contains a random ~100kb sequence (generated here https://www.bioinformatics.org/sms2/random_dna.html), and the fake_read_with_deletion.fa file contains the same sequence, but with a ~30kb deletion, leaving a ~55kb sequence upstream and ~15kb sequence downstream the deletion.

After running NGMLR and Sniffles, I was expecting to get something like is shown on Sniffles paper:

[image: image12163] https://user-images.githubusercontent.com/38498789/65278961-8ee0e900-db03-11e9-9071-0b3d7f6a464d.png

But this was what I've got:

[image: path12147] https://user-images.githubusercontent.com/38498789/65277645-bbdfcc80-db00-11e9-9a52-15d311dad36d.png

and no SV have been reported in vcf file.

Do you know what I can do to get this deletion reported?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/philres/ngmlr/issues/68?email_source=notifications&email_token=AABP346ZE3HQ43PNFCWL2NDQKPO6XA5CNFSM4IYPYT6KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMQR7OA, or mute the thread https://github.com/notifications/unsubscribe-auth/AABP3453KSOZIXBWLUHTIALQKPO6XANCNFSM4IYPYT6A .

vhfsantos commented 4 years ago

Thank you so much!