Closed hgibling closed 8 years ago
It looks like it is failing during the msga to retain the paths. Remove the -z flag?
On Wed, May 25, 2016, 19:41 Heather Gibling notifications@github.com wrote:
Hello,
I'm using vg to build a graph of the MHC region on chromosome 6. I constructed a graph using the GRCh38 reference sequence http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa sequence and the 1000 genomes project VCF SNPs file http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions for chromosome 6:
vg construct -R 6:28510120-33480577 -t 16 -m 32 -v SNPs-vcf -r reference-sequence > GRCh38-MHC-ref.vg
And then aligned the alternate haplotypes http://hgwdev.sdsc.edu/%7Eanovak/hgvm/MHC/, following the parameters from the wiki on msga https://github.com/vgteam/vg/wiki/Long-read-assemblies-using-vg-msga#example, but with the -z flag for retaining parts of the original graph that aren't in the input paths:
vg msga -g GRCh38-MHC-ref.vg -f alt-MHC.fa -B 128 -K 11 -X 2 -E 3 -G -S 0.95 -H 5 -z -D -t 16 > GRCh38-MHC-haps.vg
I get a new graph with more nodes and edges, but also several warnings:
[xg] warning: graph does not have edge from 3- to 5+ for path 6 [xg] warning: graph does not have edge from 5- to 6+ for path 6 [xg] warning: graph does not have edge from 10- to 12+ for path 6 [xg] warning: graph does not have edge from 17- to 19+ for path 6 [xg] warning: graph does not have edge from 20- to 21+ for path 6 ... [xg] warning: graph does not have edge from 594871- to 594872+ for path 6 [xg] warning: graph does not have edge from 594872- to 594874+ for path 6 [xg] warning: graph does not have edge from 594874- to 594875+ for path 6 [xg] warning: graph does not have edge from 594875- to 594877+ for path 6 [xg] warning: graph does not have edge from 594877- to 594878+ for path 6
I want to clarify these warnings: is this just telling me that GRCh38-MHC-ref.vg did not have an edge from node 3 to node 5 (path 6 being chromosome 6), but with the multiple sequence/graph alignment of the alternate haplotypes, a new edge has been added in GRCh38-MHC-haps.vg? Or is it saying there should be an edge between nodes 3 and 5 to accommodate the new input sequences, but the edge was not able to be incorporated in GRCh38-MHC-haps.vg?
Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/vgteam/vg/issues/363
I get the same long list of warnings without -z
This warning indicates something is not working correctly. We have included a path in a previous iteration and it is now not fully embedded in the graph after editing.
Would it be possible to share the test case inputs in a zip file here?
On Wed, May 25, 2016, 19:58 Heather Gibling notifications@github.com wrote:
I get the same long list of warnings without -z
— You are receiving this because you commented.
Reply to this email directly or view it on GitHub https://github.com/vgteam/vg/issues/363#issuecomment-221672897
The sequences and vcf are online:
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/ALL.chr6.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP_no_SVs.vcf.gz
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/ALL.chr6.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP_no_SVs.vcf.gz.tbi
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
# rename reference sequences to match vcf
sed 's/>chr/>/' GRCh38_full_analysis_set_plus_decoy_hla.fa > GRCh38_full_analysis_set_plus_decoy_hla-no-chr.fa
wget http://hgwdev.sdsc.edu/~anovak/hgvm/hgvm.tar.gz
tar zxvf hgvm.tar.gz
cat hgvm/MHC/ref.fa hgvm/MHC/G*fa > MHC.fa
vg construct -R 6:28510120-33480577 -t 16 -m 32 -v ALL.chr6.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP_no_SVs.vcf.gz -r GRCh38_full_analysis_set_plus_decoy_hla-no-chr.fa > GRCh38-MHC-ref.vg
Which gives me: GRCh38-MHC-ref.vg.zip
vg msga -g GRCh38-MHC-ref.vg -f MHC.fa -B 128 -K 11 -X 2 -E 3 -G -S 0.95 -H 5 -z -D -t 16 > GRCh38-MHC-haps.vg
I can't compress GRCh38-MHC-haps.vg small enough to add on github, but can send it another way (Dropbox?) if you'd like.
Hi,
Wondering if anyone's had a chance to look into what might be going on here.
When you get the final graph, does vg validate
like it?
I really have no idea what's going on here, beyond the edges needed for the reference path having escaped from the graph somehow; @ekg is the msga person.
I can reproduce this with this tiny starting graph, though. Somehow it tries to make an xg index that has path 6 from the input graph but not the correct nodes/edges for it...
Perhaps we should turn that warning into an error?
When you get the final graph, does
vg validate
like it?
I get nothing returned to terminal, so... yes I think?
Yes, no error message means the graph is valid. I'll try to reproduce this today so we can resolve the bug you ran into. Sorry for the delay.
On Sun, Jun 26, 2016, 22:36 Heather Gibling notifications@github.com wrote:
When you get the final graph, does vg validate like it?
I get nothing returned to terminal, so... yes I think?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/363#issuecomment-228621298, or mute the thread https://github.com/notifications/unsubscribe/AAI4ESD5S105whJpXOr3X3AXULRD5tbFks5qPuLLgaJpZM4Im2Ap .
We could make it output a positive result. Any suggestions? It doesn't because Unix convention is "no error, no message".
On Mon, Jun 27, 2016, 08:02 Erik Garrison erik.garrison@gmail.com wrote:
Yes, no error message means the graph is valid. I'll try to reproduce this today so we can resolve the bug you ran into. Sorry for the delay.
On Sun, Jun 26, 2016, 22:36 Heather Gibling notifications@github.com wrote:
When you get the final graph, does vg validate like it?
I get nothing returned to terminal, so... yes I think?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/363#issuecomment-228621298, or mute the thread https://github.com/notifications/unsubscribe/AAI4ESD5S105whJpXOr3X3AXULRD5tbFks5qPuLLgaJpZM4Im2Ap .
Digging back into this. I've been able to construct the graph following the commands and inputs. I think the warnings are somehow related to feeding in a graph generated by vg construct
, but I'm not yet sure exactly why. I suspect that it's possible to work around this cleanly by stripping the path out of the graph produced from the VCF by vg mod -D GRCh38-MHC-ref.vg >GRCh38-MHC-ref.nopath.vg
and then putting this into the same vg msga
command. In the end we map the reference path into the graph as part of the progressive assembly process, so we can drop it up front without issue. I'm trying this now to verify that it checks out.
Also, I find the graph that is generated to be valid at least as far as vg validate
knows, which makes the warnings all the more mysterious.
We could make it output a positive result. Any suggestions? It doesn't because Unix convention is "no error, no message".
Any sort of "graph checks out fine" would be helpful :)
I think the warnings are somehow related to feeding in a graph generated by vg construct, but I'm not yet sure exactly why.
I've been able to do a vg construct
and feed that into vg msga -g
for a small toy example without issues, but yeah it doesn't seem to like this particular graph.
I suspect that it's possible to work around this cleanly by stripping the path out of the graph produced from the VCF by vg mod -D GRCh38-MHC-ref.vg >GRCh38-MHC-ref.nopath.vg and then putting this into the same vg msga command. In the end we map the reference path into the graph as part of the progressive assembly process, so we can drop it up front without issue. I'm trying this now to verify that it checks out.
Any luck?
It did work. I can post it for you or just provide the exact commands. It seems there is something broken or mismatched with how the path annotations come out of vg construct. The reference path is embedded in the end as it is one of the sequences in the MHC.fa, so practically this shouldn't be a problem.
On Wed, Jun 29, 2016, 15:38 Heather Gibling notifications@github.com wrote:
We could make it output a positive result. Any suggestions? It doesn't because Unix convention is "no error, no message".
Any sort of "graph checks out fine" would be helpful :)
I think the warnings are somehow related to feeding in a graph generated by vg construct, but I'm not yet sure exactly why.
I've been able to do a vg construct and feed that into vg msga -g for a small toy example without issues, but yeah it doesn't seem to like this particular graph.
I suspect that it's possible to work around this cleanly by stripping the path out of the graph produced from the VCF by vg mod -D GRCh38-MHC-ref.vg > GRCh38-MHC-ref.nopath.vg and then putting this into the same vg msga command. In the end we map the reference path into the graph as part of the progressive assembly process, so we can drop it up front without issue. I'm trying this now to verify that it checks out.
Any luck?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/363#issuecomment-229358105, or mute the thread https://github.com/notifications/unsubscribe/AAI4ETKH2iIA6x63d0me_lWH-azwn5njks5qQnVagaJpZM4Im2Ap .
Ooh yay! Commands would be great :)
@hgibling this is all I've done:
time vg mod -D GRCh38-MHC-ref.vg >GRCh38-MHC-ref.nopath.vg
time vg msga -g GRCh38-MHC-ref.nopath.vg -f MHC.fa -B 128 -K 11 -X 2 -E 3 -G -S 0.95 -H 5 -z -D -t 16 > GRCh38-MHC-haps.1.vg
No errors are registered. The ref path gets embedded as well as it's in the MHC.fa.
@hgibling did you have a chance to try this out?
Yep, works great. Thanks so much!
Great! We should still figure out what's different with the construct result, but this is promising!
Looking forward to what you cook up.
On Thu, Jun 30, 2016, 19:12 Heather Gibling notifications@github.com wrote:
Yep, works great. Thanks so much!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/363#issuecomment-229725178, or mute the thread https://github.com/notifications/unsubscribe/AAI4EYqtNbxtEKNZB5xRTSMwWNaHrVCKks5qQ_kDgaJpZM4Im2Ap .
I'm going to close this since the problem seems to be resolved. The construct issue may want its own issue thread.
Hello,
I'm using
vg
to build a graph of the MHC region on chromosome 6. I constructed a graph using the GRCh38 reference sequence sequence and the 1000 genomes project VCF SNPs file for chromosome 6:And then aligned the alternate haplotypes, following the parameters from the wiki on msga, but with the
-z
flag for retaining parts of the original graph that aren't in the input paths:I get a new graph with more nodes and edges, but also several warnings:
I want to clarify these warnings: is this just telling me that GRCh38-MHC-ref.vg did not have an edge from node 3 to node 5 (path 6 being chromosome 6), but with the multiple sequence/graph alignment of the alternate haplotypes, a new edge has been added in GRCh38-MHC-haps.vg? Or is it saying there should be an edge between nodes 3 and 5 to accommodate the new input sequences, but the edge was not able to be incorporated in GRCh38-MHC-haps.vg?
Thanks