Closed ekg closed 3 years ago
I pushed a simple GFAv1 graph output to master, although it is not enabled by default (you can put it anywhere in code). I am only printing reads and unitigs with their lengths and read counts (without their sequences), and links between them (without cigar string). I have drawn two graphs in Bandage and it seems to work (both are from the same ecoli dataset during different steps in assembly). Do you need more information or will this suffice? Do you want command line argument which enables GFA output after each step?
Thanks! That looks cool.
My objective is to obtain a single file that captures the full information and sequences from the assembly. For my use I need a blunt-ended bidirectional string graph. We should have sequences in the nodes. If the graph is formatted as an overlap graph, then the cigars on the links should describe the approximate length of the overlap.
I want to use the graph in vg, which has a more restricted interpretation of sequence graphs--- they are not approximate and are meant to precisely encode regular languages that describe the information in the input to the assembly.
I added sequences and cigar strings so now the output looks like:
S [name] [sequence] LN:i:[length] RC:i:[one or number of reads in unitig] L [source] [source orientation] [destination] [destination orientation] [overlap length]M
For each link there exists a pair, e.g. for link (1+) > (2-) its pair is (2+) > (1-). I hope I understood your requirements.
If you need any assistance for enabling the GFA output or disabling some features (like heuristic graph cuts or preprocessing) let me know!
Could you easily produce GFAv1?