vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

vg call: Found pileup for nonexistent edge #1248

Open ChriKub opened 6 years ago

ChriKub commented 6 years ago

Hi, I'm trying to call variation from a flat vg graph using PacBio ccs reads, but the vg call fails:

Reading input graph Computing augmented graph terminate called after throwing an instance of 'std::runtime_error' what(): Found pileup for nonexistent edge {"from": 15592, "to": 15592, "from_start": true, "to_end": true}

Here are the commands I used:

vg index -x TAIR_flat_Chr1.xg -g TAIR_flat_Chr1.gcsa -t 40 -k 11 TAIR_flat_Chr1.vg vg map -t 40 -x TAIR_flat_Chr1.xg -g TAIR_flat_Chr1.gcsa -f errorRead.fastq > errorRead.gam vg pileup -p -t 20 -v TAIR_flat_Chr1.vg errorRead.gam > errorRead.gam.vgpu 2> errorRead.gam.vgpout vg call -v -p -t 40 -r TAIR_Chr1 TAIR_flat_Chr1.vg errorRead.gam.vgpu > errorRead.gam.call.vcf 2> errorRead.gam.call.vcout

I've packaged the raw graph and the single read fastq file to replicate this behaviour. It is available for download here The read is mapped as a repeat, although the original regions in the reference are not chimeric and thus the repeated alignment should get a worse alignment score.

Thanks for your help, Chris

glennhickey commented 6 years ago

Yeah, I can reproduce.

@ekg is this normal? where vg map aligns a read as a cycle on a node when the cycle edge is not in the graph?

In any case, vg call doesn't support calling copy number variants from cycles. You can use -U to ignore these edges and get around this error. Thanks for pointing it out though, we'll definitely have to clean it up a bit.

On Wed, Nov 15, 2017 at 1:41 PM, ChriKub notifications@github.com wrote:

Hi, I'm trying to call variation from a flat vg graph using PacBio ccs reads, but the vg call fails:

Reading input graph Computing augmented graph terminate called after throwing an instance of 'std::runtime_error' what(): Found pileup for nonexistent edge {"from": 15592, "to": 15592, "from_start": true, "to_end": true}

Here are the commands I used:

vg index -x TAIR_flat_Chr1.xg -g TAIR_flat_Chr1.gcsa -t 40 -k 11 TAIR_flat_Chr1.vg vg map -t 40 -x TAIR_flat_Chr1.xg -g TAIR_flat_Chr1.gcsa -f errorRead.fastq > errorRead.gam vg pileup -p -t 20 -v TAIR_flat_Chr1.vg errorRead.gam > errorRead.gam.vgpu 2> errorRead.gam.vgpout vg call -v -p -t 40 -r TAIR_Chr1 TAIR_flat_Chr1.vg errorRead.gam.vgpu > errorRead.gam.call.vcf 2> errorRead.gam.call.vcout

I've packaged the raw graph and the single read fastq file to replicate this behaviour. It is available for download here https://owncloud.tuebingen.mpg.de/index.php/s/xmqJJthZR2JVroB The read is mapped as a repeat, although the original regions in the reference are not chimeric and thus the repeated alignment should get a worse alignment score.

Thanks for your help, Chris

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/1248, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7gWQSo178Er9WFq5Nj6TKFENuuQpks5s2zBhgaJpZM4QfYny .

jeizenga commented 6 years ago

It might be triggering the chunked version of the aligner, which can make paths that don't take extant edges. I'm not sure how that all is wired up these days.