Closed tdlong closed 6 years ago
I think this is really a few issues, which should be reported separately.
[x] vg msga
is hard to use and has no good documentation. We say what the options do in the help text, but that doesn't really explain what combination of options one ought to use and why.
[x] The graph that comes out of vg msga
can be too complex to GCSA-index with the default parameters. To some extent we can't really fix this; if the graph is so complex it can't be indexed, it needs to be pruned, but maybe we can make the defaults a bit more aggressive with regard to the size limit, and produce a better message like "prune your graph".
[x] The VCFs produced by vg deconstruct
don't have all the header lines they ought to have.
[x] The semantics/purpose of vg deconstruct
aren't clear. It seems like @tdlong thinks that vg deconstruct
ought to be able to produce a VCF that meaningfully describes a graph generated by vg msga
, with one VCF sample per FASTA included in the MSGA run. In fact, deconstruct is not supposed to output samples representing the paths in the graph, and vg deconstruct
isn't really ever going to be very useful on graphs produced by vg msga
(because they'll contain lots of structures not meaningfully representable in non-breakend VCF).
Sorry, I was just throwing my code up there.
I was trying to use the version of vg we had to construct a graph, and then output the "polymorphisms" in a format akin to a vcf file. I was trying to push for a reproducible real regions for which events can be called.
On Jul 10, 2017, at 11:16 AM, Adam Novak notifications@github.com wrote:
I think this is really a few issues, which should be reported separately.
[ ]vg msga is hard to use and has no good documentation. We say what the options do in the help text, but that doesn't really explain what combination of options one ought to use and why.
The graph that comes out of vg msga can be too complex to GCSA-index with the default parameters. To some extent we can't really fix this; if the graph is so complex it can't be indexed, it needs to be pruned, but maybe we can make the defaults a bit more aggressive with regard to the size limit, and produce a better message like "prune your graph".
The VCFs produced by vg deconstruct don't have all the header lines they ought to have.
The semantics/purpose of vg deconstruct aren't clear. It seems like @tdlong https://github.com/tdlong thinks that vg deconstruct ought to be able to produce a VCF that meaningfully describes a graph generated by vg msga, with one VCF sample per FASTA included in the MSGA run. In fact, deconstruct is not supposed to output samples representing the paths in the graph, and vg deconstruct isn't really ever going to be very useful on graphs produced by vg msga (because they'll contain lots of structures not meaningfully representable in non-breakend VCF).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/812#issuecomment-314190218, or mute the thread https://github.com/notifications/unsubscribe-auth/ATCNNx8IpkQPXXuHi12PEUThhPYMvtkDks5sMmpqgaJpZM4NuiqA.
Code below