vgteam / toil-vg

Distributed and cloud computing framework for vg
Apache License 2.0
21 stars 14 forks source link

toil-vg wastes time filtering down input VCFs when only asked for the `--primary` graph #610

Open adamnovak opened 6 years ago

adamnovak commented 6 years ago

I just started a toil-vg run with --primary, but not --pangenome or any of the other graph construction options. But I did pass it in a bunch of VCFs, and instructions to filter out NA12878 and the other ceph samples.

It ought to know that it doesn't actually need those VCFs to make the primary graph, and not use them.

But instead, it not only imports the VCFs, but wastes time filtering them down, only to not use them later.

This is inefficient and should be fixed.

adamnovak commented 6 years ago

I'm going to leave this alone for now; just not passing in the VCFs when they aren't needed seems to be a good enough solution for my pipeline, but this is still a thing we probably want to fix eventually.

glennhickey commented 6 years ago

It also bugged me that primary graphs got pruned before GCSA, but never enough to put in the extra code. id-sorting's another one.

On Thu, Aug 23, 2018 at 3:30 PM Adam Novak notifications@github.com wrote:

I'm going to leave this alone for now; just not passing in the VCFs when they aren't needed seems to be a good enough solution for my pipeline, but this is still a thing we probably want to fix eventually.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/toil-vg/issues/610#issuecomment-415542927, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7oYUxX2xGficPQFhthGfywZUGHNAks5uTwLvgaJpZM4WE2-2 .