pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
191 stars 39 forks source link

odgi heaps for subsets of samples #505

Closed mb47 closed 1 year ago

mb47 commented 1 year ago

Hi,

I would like to run odgi heaps on various subsets of samples in my graph, as opposed to running it for the full set of samples -- is this possible without having to generate new graphs altogether?

thanks Micha

AndreaGuarracino commented 1 year ago

Hi @mb47, this is possible with the following option:

 -b[FILE], --bed-targets=[FILE]    BED file over path space of the graph,
                                          describing a subset of the graph to
                                          consider.

You can use odgi path -lL ... to get an input BED file and grep only the paths that you want. Something like:

odgi heaps -i graph.og -b <(odgi paths -i graph.og -Ll | grep ... | awk -v OFS='\t' '{print($1,$2-1,$3)}')
mb47 commented 1 year ago

Hi Andrea,

thanks for pointing this out, I never saw this option.

I have tried this and for some reason I am getting an empty output file from odgi heaps (no error messages). I tried this with my own data and also the example data that ships with the odgi distribution. Perhaps a bug? Would be grateful if you could check this.

Here is what I tried with your example data:

  1. Extract five of the paths in the graph to file paths_test.txt (attached) paths_test.txt

  2. Create the BED file:

odgi paths \
-i DRB1-3123.og \
-Ll \
| grep -f paths_test.txt \
| awk -v OFS='\t' '{print($1,$2-1,$3)}' \
| sort \
> test.odgipaths.bed
  1. Run odgi heaps with the BED file passed in:
odgi heaps \
-i DRB1-3123.og \
-n 10 \
-b test.odgipaths.bed \
> test.heaps.txt

The output file (test.heaps.txt) only contains a header.

I am running odgi v0.8.2-0-g8715c55 from bioconda.

many thanks Micha

AndreaGuarracino commented 1 year ago

That's a problem with odgi heaps interface. Try to group paths by sample (-S) or haplotype (-H) and you will get an output. However, I've just pushed a fix (https://github.com/pangenome/odgi/pull/509) to avoid empty output when no grouping is specified.

mb47 commented 1 year ago

Great - works like a charm now! Many thanks.