jltsiren / gcsa2

BWT-based index for graphs
MIT License
71 stars 11 forks source link

How do I do to combine many gcsa files into one file? #43

Open fanyangrocks opened 1 year ago

fanyangrocks commented 1 year ago

I got many gcsa files. Each of them is to one specific chromosome. How do I do to combine many gcsa files into one file?

jltsiren commented 1 year ago

Unfortunately GCSA files cannot be combined, for somewhat technical reasons. The index must be built for the entire graph at once.

GCSA2 uses an order-256 de Bruijn graph for approximating the graph that should be indexed. To save space, it prunes the de Bruijn graph in places where shorter kmers are already sufficient for defining positions in the original graph. If the same pruned kmer is present in the GCSA indexes of two graphs, that kmer is no longer sufficient for defining positions in the union graph.