Open sivico26 opened 10 months ago
@sivico26 Could you please share both graphs? You can drop a mail to simon.heumos@qbic.uni-tuebingen.de.
On first glance, I would try odgi sort -O
first, without the PG-SGD step. Then I would do odgi sort -Ygs
.
Also did you try vg convert
to obtain a GFAv1 file compatible with ODGI? Or how did you generate the graph?
Hi @subwaystation, thanks for the quick reply.
I am loading the graphs to our filesystem to see if I can send them that way.
The graphs come from using cactus
and its progressive algorithm (it is a super-pangenome actually), which generates a .hal
, then I used hal2vg
and then vg convert
to get the first .gfa
. I then post-processed that graph with smoothxg
and gfaffix
. I made the mistake of not turning off the generation of the consensus paths when using smoothxg
, so I need to prune those from the graph. I used odgi to remove the paths successfully, but then that left the 0 coverage nodes (that used to be crossed by consensus paths but not by any other paths), and now I am trying to remove those too.
I hope that helps.
@subwaystation,
In theory, a link to download the graph should be in your mail. Let me know if it works.
odgi sort -O
should work (it already did for me). Since I added -p gs
too, the difference maker is Y
.
I downloaded your graph, need to run your commands next.
@sivico26 Using the most recent master of ODGI v0.8.4-2-g1e12685c
, I was not even able to complete the odgi build
step:
/usr/bin/time --verbose odgi build -g og_opt_transfer.gfa -o og_opt_transfer.og -t 28 -P
[odgi::gfa_to_handle] building nodes: 100.00% @ 1.46e+06 bp/s elapsed: 00:00:10:54 remain: 00:00:00:00
[odgi::gfa_to_handle] building edges: 100.00% @ 1.52e+06 bp/s elapsed: 00:00:14:38 remain: 00:00:00:00
[odgi::gfa_to_handle] building paths: 13.64% @ 3.53e-02 bp/s elapsed: 00:00:05:39 remain: 00:00:35:50
[odgi::gfa_to_handle] id parsing failure for path Hbul.Hbul_1_chr6H attempting to parse node id from ''
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoull
Command terminated by signal 6
Command being timed: "odgi build -g og_opt_transfer.gfa -o og_opt_transfer.og -t 28 -P"
User time (seconds): 3582.92
System time (seconds): 643.94
Percent of CPU this job got: 178%
Elapsed (wall clock) time (h:mm:ss or m:ss): 39:31.37
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 364934480
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 127634161
Voluntary context switches: 152059032
Involuntary context switches: 2896258
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Not sure if the file is corrupt, or does not fit the GFA specs. Which version of ODGI were you using?
Hi @subwaystation,
That's strange. odgi prune
uses odgi build
under the hood when the input is .gfa, right? If that is the case, it worked for me indirectly. The version I am using is v0.8.3-26-gbc7742ed
, installed through conda.
Do you think it is related to #549? This is the same pruned graph I am referring to. It indeed deviates from GFA specs.
I was expecting the raw, unpruned graph. But you already send me the pruned one?
I realized that in your log odgi build
failed while parsing the path Hbul.Hbul_1_chr6H
. Following the commands described in #549, I can confirm this is the first path affected by the trailing ,
. So it is very likely this is the problem.
In that case, running something like:
sed -E "s|,\t\*|\t\*|" og_opt_transfer.gfa > new_og_opt_transfer.gfa
Should do the trick
This is indeed the pruned graph. Sorry if it was not the desired one. I can send you the one before pruning. Should I proceed?
Please do so! Thanks :)
This should also help @AndreaGuarracino to better understand the odgi prune
problem. And we can find out, if odgi prune
actually is the guilty one here.
Hi @subwaystation,
That's strange.
odgi prune
usesodgi build
under the hood when the input is .gfa, right? If that is the case, it worked for me indirectly. The version I am using isv0.8.3-26-gbc7742ed
, installed through conda.Do you think it is related to #549? This is the same pruned graph I am referring to. It indeed deviates from GFA specs.
While it uses odgi build
before pruning, the graph after the pruning step is making the problems it seems.
Yes, what is strange is odgi prune
(or odgi view
) writing problematic P lines after the pruning.
Dear odgi team,
Thanks for developing
odgi
. I am working on a huge graph, so each processing step takes a long time. I was pruning some empty nodes from my graph to later explore it with some of my tools. Anyway, when I was optimizing the node space after the pruning, I met the following error:Looking at previous issues, I found that #430 was a lengthy, relevant discussion. In the end, I adjusted my command and removed the
-Y
fromodgi sort
(everything else equal), and it worked. So I can continue with my analyses.However, this is somewhat unsatisfactory since I cannot do the PG-SGD sort with my graph. If I understood the discussion, the possible reasons listed do not apply to this case since I pruned the graph without trouble. To be precise, this is the command I used:
Correct me if I am wrong, but this indicates that
odgi build
does not have any trouble with my graph, which should discard many of the possible problems (e.g. W lines). Furthermore, my input graph forodgi sort
was written byodgi prune
. Thus, I wonder what could be causing the assertion error.My graphs are big (before pruning
.gfa
~118 Gb, and.og
~245 Gb; after pruning.gfa
~112 Gb and.og
~ 179 Gb), so not so easily shareable, but maybe possible if needed. I can help to check or run commands on them if instructed.We are missing something around this problem. I wanted to report what I found and continue the discussion.
Let me know what you think.
P.S: Another minor issue: why does
odgi prune
require-E
for-c
to work? That does not make sense to me. If I remove some nodes, it follows that I want to get rid of the associated edges as well. The current behavior is that if you specify only-c
, it somehow thinks that, since no edges are being removed, you can not let the edges without their associated nodes, so it does not prune the nodes that match the criteria (thus the output graph is identical to the input). To me, this is not a sensible behavior. Why is it like that? I am probably missing something.