vgteam / toil-vg

Distributed and cloud computing framework for vg
Apache License 2.0
21 stars 14 forks source link

Add surject-based vg/freebayes pipeline to toil-vg calleval #528

Closed adamnovak closed 6 years ago

adamnovak commented 6 years ago

We want to compare the performance of vg call against a surject-based pipeline. We don't want the haplotype-aware mapping paper to really be about the caller, so we want a known good caller we can use for our variant calling comparison, to let us isolate the effects of aligning against different graphs.

glennhickey commented 6 years ago

It's not super tested, but passing '--surject' to calleval should do this now. (it can also be passed to mapeval if using that as a first step). What we don't have is whole-genome freebayes support...

On Thu, May 17, 2018 at 1:30 PM, Adam Novak notifications@github.com wrote:

We want to compare the performance of vg call against a surject-based pipeline. We don't want the haplotype-aware mapping paper to really be about the caller, so we want a known good caller we can use for our variant calling comparison, to let us isolate the effects of aligning against different graphs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/toil-vg/issues/528, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7r2I_L-MPeB2VHbx1_v8dAq8D9qcks5tzbOigaJpZM4UDedm .

adamnovak commented 6 years ago

So for running calleval on mapeval output, this would look like passing --surject to mapeval, and then feeding the surjected BAM into calleval with --bams/--bam_names, whereupon it would get Freebayes run on it.

What would be needed for whole-genome Freebayes support? Is it just restricted to a single contig right now?

On Thu, May 17, 2018 at 10:46 AM, Glenn Hickey notifications@github.com wrote:

It's not super tested, but passing '--surject' to calleval should do this now. (it can also be passed to mapeval if using that as a first step). What we don't have is whole-genome freebayes support...

On Thu, May 17, 2018 at 1:30 PM, Adam Novak notifications@github.com wrote:

We want to compare the performance of vg call against a surject-based pipeline. We don't want the haplotype-aware mapping paper to really be about the caller, so we want a known good caller we can use for our variant calling comparison, to let us isolate the effects of aligning against different graphs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/toil-vg/issues/528, or mute the thread https://github.com/notifications/unsubscribe- auth/AA2_7r2I_L-MPeB2VHbx1_v8dAq8D9qcks5tzbOigaJpZM4UDedm .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/toil-vg/issues/528#issuecomment-389951465, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0_X4Yhp6gFBFKbcDKW0aXXGdUXfTnDks5tzbdlgaJpZM4UDedm .

glennhickey commented 6 years ago

Yes on both counts: --surject to mapeval will make a -surject.bam for every gam it makes, which you can then pass to the normal calleval logic. Freebayes is only set up to run as a single job. Parallelizing on chromosomes should be sufficient for whole genome. There may be a few other places where calleval assumes single contig.

On Thu, May 17, 2018 at 2:03 PM, Adam Novak notifications@github.com wrote:

So for running calleval on mapeval output, this would look like passing --surject to mapeval, and then feeding the surjected BAM into calleval with --bams/--bam_names, whereupon it would get Freebayes run on it.

What would be needed for whole-genome Freebayes support? Is it just restricted to a single contig right now?

On Thu, May 17, 2018 at 10:46 AM, Glenn Hickey notifications@github.com wrote:

It's not super tested, but passing '--surject' to calleval should do this now. (it can also be passed to mapeval if using that as a first step). What we don't have is whole-genome freebayes support...

On Thu, May 17, 2018 at 1:30 PM, Adam Novak notifications@github.com wrote:

We want to compare the performance of vg call against a surject-based pipeline. We don't want the haplotype-aware mapping paper to really be about the caller, so we want a known good caller we can use for our variant calling comparison, to let us isolate the effects of aligning against different graphs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/toil-vg/issues/528, or mute the thread https://github.com/notifications/unsubscribe- auth/AA2_7r2I_L-MPeB2VHbx1_v8dAq8D9qcks5tzbOigaJpZM4UDedm .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/toil-vg/issues/528#issuecomment-389951465, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0_ X4Yhp6gFBFKbcDKW0aXXGdUXfTnDks5tzbdlgaJpZM4UDedm .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/toil-vg/issues/528#issuecomment-389956674, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7m2MrqUqUIoKqWAwUf7GaF8zqNkuks5tzbt3gaJpZM4UDedm .

adamnovak commented 6 years ago

This should be integrated into my script now.