vgl-hub / gfastats

A single fast and exhaustive tool for summary statistics and simultaneous *fa* (fasta, fastq, gfa [.gz]) genome assembly file manipulation.
MIT License
92 stars 8 forks source link

Possible to add gaps? #44

Closed ValentinaPeona closed 11 months ago

ValentinaPeona commented 1 year ago

Hi!

Is it possible to add gaps within a contig so to split the contig into two sequences (coordinates given by the user) that will be part of the same scaffold?

Thanks for the help

Valentina

gf777 commented 1 year ago

Hi @ValentinaPeona Thanks for reaching out. Yes, that makes a lot of sense. We will work on adding this as a SAK instruction. I'll get back to you once it's implemented. Best

gf777 commented 11 months ago

Please take a look at the brand new instruction for masking sequences! https://github.com/vgl-hub/gfastats/tree/main/instructions#mask

The logic for this one is less trivial than you'd think, because gaps typically would connect segments, not be embedded in them. Therefore I introduced the idea of 'inner gaps', that is a gap that is within a segment, so the GFA segment is literally split in two, before and after the gap, as part of a path. You can check out how this looks like on this test file: gfastats testFiles/random1.fasta -k testFiles/random1.mask.sak -ogfa2 Please try it out and let me know if it works for your case and does what is expected. All the best