pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
368 stars 41 forks source link

extracting node path-coverage information #252

Closed SAMtoBAM closed 1 year ago

SAMtoBAM commented 1 year ago

Hi there,

Perhaps there is a simple way I have not found but I would like to essentially extract a list of all the nodes, there graph relative position, their length and coverage (number of paths). Is this possible?

I am looking for regions that are not present within all the haplotypes used to build the graph.

Thanks

subwaystation commented 1 year ago

Hi @SAMtoBAM,

I don't think such a tool exists, yet. Amongst other things odgi depth offers

        -d, --graph-depth-table           Compute the depth and unique depth on
                                          each node in the graph, writing a
                                          table by node: node.id, depth,
                                          depth.uniq.
        -v, --graph-depth-vec             Compute the depth on each node in the
                                          graph, writing a vector by base in one
                                          line.
        -D, --path-depth                  Compute a vector of depth on each base
                                          of each path. Each line consists of a
                                          path name and subsequently the
                                          space-separated depth of each base.
        -a, --self-depth                  Compute the depth of the path versus
                                          itself on each base in each path. Each
                                          line consists of a path name and
                                          subsequently the space-separated depth
                                          of each base.
        -S, --summarize                   Provide a summary of the depth
                                          distribution in the graph, in a
                                          tab-delimited format it prints to
                                          stdout: node.count, graph.length,
                                          step.count, path.length,
                                          mean.node.depth
                                          (step.count/node.count), and
                                          mean.graph.depth
                                          (path.length/graph.length).

Would that suffice for you?

With number of paths you mean the unique number of paths crossing the node or all steps of all paths visiting the node?

SAMtoBAM commented 1 year ago

Thanks for the response

Nicely we both converged on odgi depth and it works rather well (using the -d option) and I was looking for the unique number of paths so that provides the required information I also converted the graph to a gfa (using odgi view) and used that to extract the node length I also used the gfa to confirm the results of odgi depth so I have some peace of mind!

Also also used odgi pav to extract similar information (using small windows along each path rather than nodes as the nodes could be problematic), giving me essentially the path coverage per window (I used it in terms of binary presence/absence). I got all that direction from here which was extremely helpful Although I guess it is path-based instead of node-based, which may turn out to be easier for me in the long run.

Thanks again

subwaystation commented 1 year ago

Glad to hear that our tools and Docs were able to help you out!