marschall-lab / gaftools

General purpose utility related to GAF files
https://gaftools.readthedocs.io/
MIT License
11 stars 0 forks source link

[Feature Request] Extract chromosome path from graph with the GFA class #17

Closed fawaz-dabbaghieh closed 10 months ago

fawaz-dabbaghieh commented 11 months ago

Arda asked if the GFA class can have a function that retrieves a path that represents a chromosome in the pangenome graph.

The idea is then to extract the node with the SN tag with the chromosome required, order the nodes according to the SO tags, and this ordered list of nodes should create a linear path, i.e. there are nodes connecting each node in the list with the following node.

fawaz-dabbaghieh commented 11 months ago

This should be doable now. You can use the internal function graph.get_path(chromosome) that takes the chromosome name similar to the SN tag as input and returns a list with the path nodes. If the list is empty, then something went wrong and you need to check the warning messages.

Still need to add a test case for this.

asylvz commented 11 months ago

Thanks Fawaz, it seems to be working fine. However, we are missing the orientation of the nodes. After sorting, we might need to check the respective "L" lines to find the edges (unless you already have them in your data structure and get the orientation so that we have an output such as s1>s2s19....>s151

fawaz-dabbaghieh commented 11 months ago

@asylvz I just pushed a new function that takes the path returned by graph.get_path(chromosome) and returns a string with node orientation but in the GFA path line format, i.e. node1+,node2-,node3+ and so on. This format is described here. The function is called return_gfa_path, and takes as input a list of nodes in the path that you get from get_path