vgl-hub / gfastats

A single fast and exhaustive tool for summary statistics and simultaneous *fa* (fasta, fastq, gfa [.gz]) genome assembly file manipulation.
MIT License
91 stars 8 forks source link

Could we use it to calculate pan-genome? #37

Closed ld9866 closed 1 year ago

ld9866 commented 1 year ago

Hello software developer! Thank you for developing such a good software. At present, we would like to count some indicators of the pan-genome we have built, including Nodes, Edges and Reference nodes,Non-reference nodes, Node length, Length of non-reference nodes, Length of small (< Length of non-reference nodes, length of small (< 50 bp) non-reference nodes, Length of large (> 50 bp) non-reference nodes.

Could we use your software to analyze the gfa files we generate? Merry Christmas!

gf777 commented 1 year ago

hi @ld9866,

Gfastats has been designed to evaluate/manipulate gfa files, primarily for assembly graphs. We used it in a few applications for pangenomes but not many so far. We obviously count the nodes and edges but do not assume the presence of a selected reference so far. I suspect these new metrics you need would be relatively easy to implement. I wonder if it would make more sense as a new pangfastats branch though. Food for thought for 2023.

Happy holidays!

ld9866 commented 1 year ago

Thank you for your patient reply! Hope your software is getting better and better! Best wishes Happy holidays