pangenome / smoothxg

linearize and simplify variation graphs using blocked partial order alignment
Other
56 stars 6 forks source link

smoothxg throws what(): basic_string::_M_construct null not valid #16

Open cgroza opened 4 years ago

cgroza commented 4 years ago

Hi, I am running smoothxg on GFA graphs produced via vg construct > vg view.

cgroza@blg9122:~/.../sv-graph/graphs $ ~/smoothxg/bin/smoothxg -g chr1.vg.gfa
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
Aborted (core dumped)
cgroza@blg9122:~/.../sv-graph/graphs $

This is the error I am getting. I validated each vg file with vg validate and there were no warnings. I enabled debugging with -d and the output stays the same. What could be causing this error and how could I fix it?

Thank you, Cristian Groza

ekg commented 4 years ago

I've never tested on a graph built by vg construct. But this is a good use case, because you might be able to normalize the representation of complex variation in the VCF.

It's possible that you need to have paths covering every base in the graph for smoothing to not break like this.

Please try running vg construct -a when building your graph. Does that work?

cgroza commented 4 years ago

Hi, Thanks for the quick reply. I already passed the -a flag to vg construct and the paths are indeed saved in the vg file before I convert to GFA with vg view. And yes, I am attempting to collapse alternative sequences that are built from a multi-sample merged VCF file. Would you like me to link a test VCF file?

cgroza@beluga4:~/scratch/sv-graph $ vg paths -L -v graphs/chr1.vg | head
_alt_690f8d2fa0e64915865faf22fb22b9a320da3fd2_3
_alt_646ca4e078f643e66c6f63af905a593f2340c187_0
_alt_d9a3a44aafbdb32476a084ea5912eb2042a419a4_5
_alt_460556e6579145a11e1258b8a8ea4fad75b2eac7_10
_alt_ec650cad5a9dcf3f96ab1f7726ad150119570c29_1
_alt_8f499a29d6265f1c936e58362b92b8bf5605f744_0
_alt_1a4af475581ce7bf25bfb6d6b0079e7f4eb96d09_2
_alt_cdeaef57e2f1a0ce2887a7b68321614e862a1cd8_15
_alt_ab71928db30ae5ffdabc5e86b7f88844d4e5eed3_0
_alt_cdeaef57e2f1a0ce2887a7b68321614e862a1cd8_3
cgroza@beluga4:~/scratch/sv-graph $

So unfortunately, it does not work. How would the path information be kept in the GFA file after conversion?

Cristian

ekg commented 4 years ago

The paths are under P lines in the GFA.

You don't get any output at all from smoothxg? By default, it provides logging.

You could try passing the -n parameter, which will avoid some prep sorting steps at the beginning that aren't necessary for VCF based graphs.

On Fri, Oct 2, 2020 at 1:32 PM Groza Cristian notifications@github.com wrote:

Hi, Thanks for the quick reply. I already passed the -a flag to vg construct and the paths are indeed saved in the vg file before I convert to GFA with vg view.

cgroza@beluga4:~/scratch/sv-graph $ vg paths -L -v graphs/chr1.vg | head _alt_690f8d2fa0e64915865faf22fb22b9a320da3fd2_3 _alt_646ca4e078f643e66c6f63af905a593f2340c187_0 _alt_d9a3a44aafbdb32476a084ea5912eb2042a419a4_5 _alt_460556e6579145a11e1258b8a8ea4fad75b2eac7_10 _alt_ec650cad5a9dcf3f96ab1f7726ad150119570c29_1 _alt_8f499a29d6265f1c936e58362b92b8bf5605f744_0 _alt_1a4af475581ce7bf25bfb6d6b0079e7f4eb96d09_2 _alt_cdeaef57e2f1a0ce2887a7b68321614e862a1cd8_15 _alt_ab71928db30ae5ffdabc5e86b7f88844d4e5eed3_0 _alt_cdeaef57e2f1a0ce2887a7b68321614e862a1cd8_3 cgroza@beluga4:~/scratch/sv-graph $

So unfortunately, it does not work. How would the path information be kept in the GFA file after conversion?

Cristian

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pangenome/smoothxg/issues/16#issuecomment-702681819, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJ3XUSCZ7T6IM4J6FDSIW23NANCNFSM4SA7QDBA .

cgroza commented 4 years ago

So my GFA lines do indeed have P lines. No, I only get 3 lines of output on stderr/stdout and they are the logic_error exception and core dump above. I passed the -n parameter and got equal results.

Cristian

ekg commented 4 years ago

What git commit are you running here? How did you build and install smoothxg?

Could you reproduce the same thing but on a very small graph that you can share?

cgroza commented 4 years ago

Yes I will try to adapt the test case in vg/test/small/x.fa and post it here.

cgroza commented 4 years ago

Hi,

Here is the minimal test case GFA. It has two Alu insertions right in the middle (with one basepair difference). test.gfa.gz It was built from these files with vg construct -a > vg view. x.fa.gz small.vcf.gz

I am on the master branch of the smoothxg repo. I followed the compile instructions on the README page.

ekg commented 4 years ago

I've found the problem.

One of your P lines doesn't have the right number of fields. It's missing the path description.

-> % grep _alt_5d2a3c27da7879b7e1fb081ed7596ea49fe65e13_0 test.gfa
P       _alt_5d2a3c27da7879b7e1fb081ed7596ea49fe65e13_0
grep -v _alt_5d2a3c27da7879b7e1fb081ed7596ea49fe65e13_0 test.gfa >test1.gfa
smoothxg -g test1.gfa >test1.smooth.gfa

This should probably be checked somewhere in the xg process that's failing.

What is this field meant to represent? A deletion allele?

ekg commented 4 years ago

I suspect it's something made by vg construct with the -a flag.

cgroza commented 4 years ago

I think vg construct -a is adding these _alt_...._0 at every insertion site. Traversing this path would give you the reference sequence. So vg is confusing the reference path with a deletion allele at insertion sites? Is this new behaviour in vg construct? I don't remember this behaviour in past versions.

cgroza commented 4 years ago

I removed the offending path and this was the output of smoothxg:

smoothed.gfa.gz

It seems the graph topology was affected everywhere, even outside the two very similar Alu insertion. Will it collapse any two similar sub-sequences of the graph?