Open subwaystation opened 1 year ago
My takaway here is:
odgi extract
to odgi sort
: odgi extract -i graph.og -o - -r <CHROM:START-END> -d <INT> | odgi sort -i - -o graph.Y.O.og -O
should be done as default in order to circumvent the last disk space issue. -d
has to be set in order to retrieve a graph giving a more global answer.
Hi @AndreaGuarracino,
I was trying to extract
grch38#chr1:13104252-13122521
from the Chr1 HPRC pangenome graph. However, I ran into lot's of trials and errors until I somehow got what I wanted. Surprisngly, theodgi extract
output occupied quite some disk space sometimes. More details below.Fetching the data
1st round
How can it be that before optimization the graph occupies ~1GB on disk? Let's take a look at the actual subgraph we got.
This doesn't look so bad, but all paths are somehow scattered, so we miss the big picture, therefore I did lot's of trials to understand what's going on.
2nd round
There are now so many paths in the PNG, one can barely open it.
3rd round
Basically no difference to the 1st round. After close inspection, this makes sense. Because, e.g. the path distances of all CHM13 subpaths are much more than 1000 nucleotides. It seems I should set
-d 100000
so I can catch all the missing path parts.4th round
Alright, this finally shows us that there is lot's of new sequence popping up in this GRCh38 reference region. That's why it is so hard to get a subgraph. Then I thought: What about
-E
? Here we go.....5th round
This didn't work out at all, I see by far too much of the pangenome compared to the region I wanted to extract. So maybe I need to PG-SGD first?
6th round
Now the resulting graph is much smaller on disk, great! Locally, this is as best as it can get. However, we still lack the overall picture.
7th round
I think this is as good as it can get, if one is interested in the overall picture and not-so-fragmented paths.