vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

reference of vg desconstruct must be one contig? #3062

Open zhanghaoipp opened 4 years ago

zhanghaoipp commented 4 years ago

I have 100+ fungi assemblies, they were aligned by cactus and convert to vg graph. I try to use vg desconstruct to cal variants. But I can not use strain name as reference, it just work when I select one contig. How to set strains names with several contigs as reference in this condition? Thank you!

glennhickey commented 4 years ago

Does using -p contig1 -p contig2 -p contig3 etc. not work?

zhanghaoipp commented 4 years ago

Does using -p contig1 -p contig2 -p contig3 etc. not work?

Thank you Glennhickey! It works! But it is still different from normal vcf, the column is contigs but not isolates, maybe contigs from same isolate are can be merged?

The readme showed the basic usage is mapping and call. I also have raw reads of all strains, so can I map reads to the cactus graph individually and use vg call to call variants then merge all the vcfs? Is there any difference with desconstruct? Which is recommended? Thank you!

glennhickey commented 4 years ago

Do you mean the first two VCF columns look incorrect? I'm not sure there's much that can be done if so, as each contig (column 1) is a separate path on your graph and has an independent coordinate system (column 2).

If you mean that there are too many sample columns (column 10 and beyond), then yes the default output which creates a sample for each path in the graph isn't going to help. For now the only way to merge these is with the -A option, and that relies on the contigs of each isolate having a unique prefix in their name. If you can enforce that, then the VCF you get should work. Otherwise, we'd need an option to specify which names to merge, which would not be difficult to implement. It's also important in general to use -e, as well as to set the ploidy with -d.

I think deconstruct is a cleaner way to get a VCF than remapping, at least in theory. The map/call output should be generally consistent though.

On Mon, Oct 26, 2020 at 12:20 PM zhanghaoipp notifications@github.com wrote:

Does using -p contig1 -p contig2 -p contig3 etc. not work?

Thank you Glennhickey! It works! But it is still different from normal vcf, the column is contigs but not isolates, maybe contigs from same isolate are can be merged?

The readme showed the basic usage is mapping and call. I also have raw reads of all strains, so can I map reads to the cactus graph individually and use vg call to call variants then merge all the vcfs? Is there any difference with desconstruct? Which is recommended? Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/3062#issuecomment-716659899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG373U6776PGAI6R36J373SMWOUXANCNFSM4S7BETCQ .

zhanghaoipp commented 4 years ago

sample columns

Yes, I mean the sample columns. The contig name like "140001.NODE_25_length_505019_cov_14.525173", 140001 is the prefix of the isolate, the later part is the contig name from spades. So I need to list all 100+ isolates (14002, 140003 ......) names seperate by comma behind -A?

glennhickey commented 4 years ago

Yeah, if you have 100 isolates, then you need to list 100 prefixes with -A prefix1 -A prefix2 -A prefix3 etc.

On Tue, Oct 27, 2020 at 2:27 AM zhanghaoipp notifications@github.com wrote:

sample columns

Yes, I mean the sample columns. The contig name like "140001.NODE_25_length_505019_cov_14.525173", 140001 is the prefix of the isolate, the later part is the contig name from spades. So I need to list all 100+ isolates (14002, 140003 ......) names seperate by comma behind -A?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/3062#issuecomment-717016160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG373RLUAQWKVCYGM3YDTDSMZR55ANCNFSM4S7BETCQ .