Open subwaystation opened 5 years ago
I also played around with the -c
parameter [20, 100, 1000, 500, 50]
, but that did not solve the issue.
This might have to do with the invalidity of paths in subgraphs. We have been talking about how to resolve this for some time without much progress. There are some simple hacks, like making new paths with a naming that relates them to the path range they are derived from.
Thanks for the feedback @ekg . I have to admit, this makes me kind of unhappy. Can you point me to these hacks? Are there any examples? Can I contribute somehow so that we can solve this issue in a foreseeable time?
I assume deleting the invalid paths would solve the issue. But then we wouldn't have a complete subgraph?
vg used to use the "rank" field to somewhat support disconnected paths. But we lost that when switching to the new API. The discussion on how to properly support subpaths with the new API is here: https://github.com/vgteam/libhandlegraph/issues/29
vg chunk
takes care to ensure that the reference path (DBVPG6765_chrVII
) is not disconnected. And this has been sufficient for our VCF-based graphs. But now you have other assemblies in the graph and that's tripping up DBVPG6765_chrVII
.
I think we probably need Erik's simple hack of renaming path chunks to get around this. It might go here: https://github.com/vgteam/vg/blob/master/src/algorithms/subgraph.cpp#L292-L322
I'll try to take a shot at implementing it today. Sorry about this!
Thanks for the prompt answer @glennhickey !
Ah I see.... so I want a subgraph where the reference path is not really part of any more, because of the assembly styled graph. And that vg chunk
can not handle.
Cool, looking forward to that implementation ;)
If I can assist you at some point, let me know.
One edge case I can think of, having SequenceTubeMap and the current vg chunk
in mind, is the following:
If I extract a subgraph by path_name:start_pos-end_pos
, I will only get the paths running through the nodes of the subgraph. But it could be, that there is a path, which does not have any of the sequence represented by these nodes. Therefore, it is anchored in a node more left and a node more right to the subgraph. But, this might be a structural variation I want to be able to show in e.g. SequenceTubeMap.
Would it still be a valid subgraph if there is a path in it, having no visiting nodes?
I just tried the data and can reproduce. But I'm curious to know why tubemaps is crashing though? vg view
reporting the graph is invalid is just a warning. It still exits with code 0 (as per your output). Is it that tubemaps is looking for the edge that's missing in the path?
I suspect that it can not deal with the fact that the *.annotate.txt
file is empty. But, I have to admit that I am not familiar enough with TubeMaps to test that out, yet.
I fuzzled around in the code, so that TubeMaps' implemented command line leaves out -T
-b
-E
and then it just breaks again giving no helpful error whatsoever. At least to me.
@glennhickey that code snippet doesn't quite do what I'm suggesting.
My idea was to break the paths where they are discontinuous in the subgraph. For each broken path segment, we set a name that relates it to the path it was derived from.
The hack I wanted to implement was using naming convention to convey path ranges of the subgraph.
So if we had a path x that got split into pieces we might get paths like [x]:10-20, [x]:30-40. Then we could also make another subgraph of one, yielding [[x]:30-40]:3-6. Maybe we should be translating the positions from the original path, but that would be a bit more involved.
@ekg which code snippet?
The subgraph one you have a PR against.
On Fri, Oct 11, 2019, 19:15 Glenn Hickey notifications@github.com wrote:
@ekg https://github.com/ekg which code snippet?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2504?email_source=notifications&email_token=AABDQEMACY5SXBMNKLPIT5TQOCYDPA5CNFSM4I7XIQ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBAURKQ#issuecomment-541149354, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPGGTN6LGD3M5ID7E3QOCYDPANCNFSM4I7XIQ7A .
Any updates here? As far as I got it, #2506 did not pass Travis?
Are you able to try the branch from #2506 to see if it solves your problem? That PR's stuck on a unit test failure that occurs only on Mac that I'm having trouble reproducing.
I will try it out and report back here.
So I did:
git clone --recursive https://github.com/vgteam/vg.git
cd vg
git checkout glenn
. ./source_me.sh && make
And I ran into:
In file included from src/packed_path_position_overlays.cpp:1:
include/bdsg/packed_path_position_overlays.hpp:16:10: fatal error: BooPHF.h: No such file or directory
16 | #include <BooPHF.h>
| ^~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:63: obj/packed_path_position_overlays.o] Error 1
make[1]: Leaving directory '/home/heumos/git/vg_2504/vg/deps/libbdsg'
make: *** [Makefile:618: lib/libbdsg.a] Error 2
Is there a dependency that is not installed on my machine? I have ArchLinux running.
The README of https://github.com/vgteam/libbdsg tells me, I need to have https://github.com/rizkg/BBHash/tree/alltypes installed in a place on the system where the compiler can find them. But BBHash seems to be there:
[heumos@wave deps]$ ls /home/heumos/git/vg_2504/vg/deps/BBHash/
BooPHF.h example.cpp LICENSE
bootest.cpp example_custom_hash.cpp makefile
bootestFile.cpp example_custom_hash_strings.cpp README.md
I would expect that the MAKEFILE takes care of the rest?
You can try updating the submodules (git submodule sync --recursive ; git submodule update --init --recursive), or running
git clone --recursive https://github.com/vgteam/vg.git --branch glenn
at the outset.
On Tue, Oct 29, 2019 at 10:58 AM Simon Heumos notifications@github.com wrote:
The README of https://github.com/vgteam/libbdsg tells me, I need to have https://github.com/rizkg/BBHash/tree/alltypes installed in a place on the system where the compiler can find them. But BBHash seems to be there:
[heumos@wave deps]$ ls /home/heumos/git/vg_2504/vg/deps/BBHash/ BooPHF.h example.cpp LICENSE bootest.cpp example_custom_hash.cpp makefile bootestFile.cpp example_custom_hash_strings.cpp README.md
I would expect that the MAKEFILE takes care of the rest?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2504?email_source=notifications&email_token=AAG373X4LGE7VXL3BT2N5RLQRBFQLA5CNFSM4I7XIQ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECQ2AUQ#issuecomment-547463250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG373UI7X4HPGQEGBYJM63QRBFQLANCNFSM4I7XIQ7A .
Thanks! On it again.
I did:
git clone --recursive https://github.com/vgteam/vg.git --branch glenn
cd vg/
. ./source_me.sh && make
And I still get:
In file included from src/packed_path_position_overlays.cpp:1:
include/bdsg/packed_path_position_overlays.hpp:16:10: fatal error: BooPHF.h: No such file or directory
16 | #include <BooPHF.h>
| ^~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:63: obj/packed_path_position_overlays.o] Error 1
make[1]: Leaving directory '/home/heumos/git/vg_2504/vg/deps/libbdsg'
make: *** [Makefile:618: lib/libbdsg.a] Error 2
Am I missing something?
The file exists in deps:
[heumos@wave vg]$ ls deps/BBHash/BooPHF.h
deps/BBHash/BooPHF.h
That worked fine here. I just rebased this on master, which may contain some fixes that make building more robust. If you're able to build the master branch, this one should too (fresh checkout recommended).
On Wed, Oct 30, 2019 at 5:30 AM Simon Heumos notifications@github.com wrote:
I did:
git clone --recursive https://github.com/vgteam/vg.git --branch glenn cd vg/ . ./source_me.sh && make
And I still get:
In file included from src/packed_path_position_overlays.cpp:1: include/bdsg/packed_path_position_overlays.hpp:16:10: fatal error: BooPHF.h: No such file or directory 16 | #include
| ^ ~~~~~ compilation terminated. make[1]: [Makefile:63: obj/packed_path_position_overlays.o] Error 1 make[1]: Leaving directory '/home/heumos/git/vg_2504/vg/deps/libbdsg' make: [Makefile:618: lib/libbdsg.a] Error 2Am I missing something?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2504?email_source=notifications&email_token=AAG373RMASENQEHBXFAE45TQRFH3FA5CNFSM4I7XIQ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECTPLEY#issuecomment-547812755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG373QVVZDX5E27N252R2LQRFH3FANCNFSM4I7XIQ7A .
I am not even able to build the master branch on my machine, see https://github.com/vgteam/vg/issues/2522. But it aborts with a different error. Puzzling.
I will try to build on a VM which hosts Ubuntu 18.04. But I still want to be able to compile vg on my machine.
So I was able to build both the current master and @glennhickey's branch on Ubuntu 18.04. Now I can test his implementation.
But it would make me really happy, if I could compile vg on my machine.
Hi vgteam :) @superjox @trgibbons
What you were trying to do: I tried to use https://github.com/vgteam/sequenceTubeMap in order to browse certain positions
S288C_chrVII:95084-95584
of a yeast 12 sample pangenome. The graph was build with https://github.com/ekg/seqwish which's.gfa
was ported to.vg
. A.xg
was created, too. In order to confirm the issue, I also run thevg chunk
andvg view
manually. That produced the same problem. However, when I leave the-T
and-b
+-E
out, the command runs through without issues. But as SequenceTubeMap requires these inputs, I am stuck here.What you wanted to happen: Take a look at the specified positions.
What actually happened: SequenceTubeMap output:
vg view err data: [vg view] warning: graph is invalid!
vg view exited with code 0
graph path 'DBVPG6765_chrVII' invalid: edge from 4218946 start to 1204907 start does not exist [vg view] warning: graph is invalid!
time vg view --gfa-in /ctx/projects/Q2380-Pantograph/03_data_processing/10_seqwish/10_yeast/21_PacBio_twelve/joint_yeast_genomes-twelve.gfa --vg > joint_yeast_genomes-twelve.vg
vg index -x joint_yeast_genomes-twelve.xg -t 5 joint_yeast_genomes-twelve.vg
vg chunk -x joint_yeast_genomes-twelve.xg -c 2 -p S288C_chrVII:95084-95584 -T -b chunk -E regions.tsv | vg view -j - > S288C_chrVII:95084-95584.json
vg chunk -x joint_yeast_genomes-twelve.xg -c 2 -p S288C_chrVII:95084-95584 | vg view -j - > S288C_chrVII:95084-95584.json