pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
194 stars 39 forks source link

Problem with intervals selection with odgi viz #420

Closed brettChapman closed 1 year ago

brettChapman commented 2 years ago

Hi

I'm having issues getting a particular nucleotide region in my graph to be displayed using odgi viz.

I set the following parameters based on the usage information:

odgi viz -i barley_pangenome_3H.og -r 'Morex_v2_chr3H:471496705-*' -o barley_pangenome_graph_3H.viz_inv_r471496705-end.png -x 1500 -y 500 -a 10 -z

I tried with the ODGI graphs produced from PGGB, and I also tried building a ODGI graph from scratch again using the GFA, but I still get the displayed image showing the entire length of the genome graph and not the subset starting from position Morex_v2_chr3H:471496705-* leading up to the end of the graph. Is there something I'm doing wrong?

Basically, I'm wanting to zoom into a region of the graph in more detail. My intent is to show a zoomed in region within the graph, for a figure in the paper I'm writing. Thanks.

AndreaGuarracino commented 2 years ago

Hi @brettChapman,

it is very likely that this 'extraction' functionality of odgi viz is still buggy (I am tempted to disable it for a while).

I would suggest running odgi extract first and then odgi viz. If that doesn't work, I would need your example to understand what is happening and work on it.

Apologize for the buggy behavior. Please let me know.

ekg commented 2 years ago

We have a recursive extraction mode that might really help. Andrea, could you describe how this works?

On Tue, Jun 14, 2022, 10:53 Andrea Guarracino @.***> wrote:

Hi @brettChapman https://github.com/brettChapman,

it is very likely that this 'extraction' functionality of odgi viz is still buggy (I am tempted to disable it for a while).

I would suggest running odgi extract first and then odgi viz. If that doesn't work, I would need your example to understand what is happening and work on it.

Apologize for the buggy behavior. Please let me know.

— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/issues/420#issuecomment-1154901979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEIVZ7IJCTXSMO3EHSLVPBCA3ANCNFSM5YW3VQWQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AndreaGuarracino commented 2 years ago

Yes, this is a recent odgi extract power-up that can help if you get too many subpaths in the extracted subgraphs. You can find a few examples here https://github.com/pangenome/odgi/pull/419.

In short, odgi extract cuts paths if they cross the cut points specified in the extraction. If the subpaths are enough close, they can be merged by specifying the -d/--max-distance-subpaths parameter. The process is performed multiple times (3 by default), as in each iteration, new nodes are added and new subpaths can become mergeable. Doing this multiple times leads to more stable results.

brettChapman commented 2 years ago

Thanks @AndreaGuarracino I'll try odgi extract first, based on what you have in your example here: odgi extract -i chr6.pan.fa.a2fb268.4030258.6a1ecc2.smooth.og -L 100000 -r grch38#chr6:29722775-33143325 -o - -t 16 -d 500000 | odgi sort -t 16 -i - -O -o mhc.og && odgi viz -i mhc.og -o mhc.png

brettChapman commented 2 years ago

@AndreaGuarracino I tried odgi extract but it complained about the coordinates.

I used: odgi extract -t 16 -i barley_pangenome_graph_3H.og -d 500000 -L 100000 -r Morex_v2_chr3H:471496705-* -o barley_pangenome_graph_3H.viz_inv_r471496705-end.og

The 471496705 position is the nucleotide position and * denotes the end of the graph. Do I need to use node IDs instead? I'm trying to slice out the end quarter of the graph, before deciding which exact region to zoom into. Thanks.

AndreaGuarracino commented 2 years ago

Arg, odgi extract doesn't support the * character. As second coordinate in your interval, you can specify the length of the Morex_v2_chr3H path.

ekg commented 2 years ago

We could support this though. Might be a good feature request!

On Wed, Jun 15, 2022, 09:07 Andrea Guarracino @.***> wrote:

Arg, odgi extract doesn't support the * character. As second coordinate in your interval, you can specify the length of the Morex_v2_chr3H path.

— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/issues/420#issuecomment-1156071186, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEIC4RUHSEGEJZ6ISPDVPF6MTANCNFSM5YW3VQWQ . You are receiving this because you commented.Message ID: @.***>

brettChapman commented 2 years ago

Thanks @AndreaGuarracino I've specified the end of the chromosome now and it seems to be working. @ekg yes, adding a feature which automatically goes to the start or end of the pangenome using * would be pretty convenient.

brettChapman commented 2 years ago

Hi @AndreaGuarracino I've produced a subgraph image based on my coordinates. The image doesn't really appear to reflect the region well. I'm thinking this may be because of the -L, -e and -d parameters. How can I ensure I'm choosing the correct parameters so that I merge all sub-paths into one for each haplotype and the overall graph appears more inline with what I'm seeing on the larger scale graph image? Is it simply a trial and error approach until I get something which works? Can the parameters I used with PGGB provide insight into what I should use with odgi extract?

I've now increased -L to 1000000, -d to 1000000 and -e to 5.

brettChapman commented 2 years ago

This is the full graph:

barley_pangenome_3H fasta 5afc036 7715ffd 41ba588 smooth og viz_inv

And this is the right most 1/4 end of the graph based on an -L 100000 -e 3 -d 500000:

barley_pangenome_graph_3H viz_inv_r471496705-end

brettChapman commented 2 years ago

And this is the right most 1/4 end of the graph based on an -L 1000000 -e 5 -d 1000000:

barley_pangenome_graph_3H viz_inv_r471565318-end

ekg commented 2 years ago

I guess we need to push the -e parameter further.

Looking at this it really seems that we can do a simpler merging approach on the path sub ranges, then re extract in one step.

On Thu, Jun 16, 2022, 09:05 Brett Chapman @.***> wrote:

And this is the right most 1/4 end of the graph based on an -L 1000000 -e 5 -d 1000000:

[image: barley_pangenome_graph_3H viz_inv_r471565318-end] https://user-images.githubusercontent.com/8529807/174011831-131cabd2-64b9-4d8d-a749-d6d3de598507.png

— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/issues/420#issuecomment-1157310459, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQELPM3IDSTB735N5ENTVPLG4TANCNFSM5YW3VQWQ . You are receiving this because you were mentioned.Message ID: @.***>

ekg commented 2 years ago

Here's an approach. You could take one of these and generate a BED file representing the path sub ranges as intervals. Then bedtools merge and a wide enough distance should give a single range per assembly. Taking these intervals back as input to odgi extract should give a single contiguous path for each.

On Thu, Jun 16, 2022, 09:13 Erik Garrison @.***> wrote:

I guess we need to push the -e parameter further.

Looking at this it really seems that we can do a simpler merging approach on the path sub ranges, then re extract in one step.

On Thu, Jun 16, 2022, 09:05 Brett Chapman @.***> wrote:

And this is the right most 1/4 end of the graph based on an -L 1000000 -e 5 -d 1000000:

[image: barley_pangenome_graph_3H viz_inv_r471565318-end] https://user-images.githubusercontent.com/8529807/174011831-131cabd2-64b9-4d8d-a749-d6d3de598507.png

— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/issues/420#issuecomment-1157310459, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQELPM3IDSTB735N5ENTVPLG4TANCNFSM5YW3VQWQ . You are receiving this because you were mentioned.Message ID: @.***>

brettChapman commented 2 years ago

Thanks @ekg I'll give that approach a shot and see if it works.

brettChapman commented 2 years ago

@ekg to generate the BED file from the OG graph would I use odgi flatten?

brettChapman commented 2 years ago

or odgi procbed?

AndreaGuarracino commented 2 years ago

I suppose he meant to take the names of the subpaths in the extracted graph, which are in the form

NAME:START-END

and split them into three columns to obtain a BED file to be fed to bedtools.

brettChapman commented 2 years ago

@AndreaGuarracino Thanks. I can get those by printing out the path names.