jts / sga

de novo sequence assembler using string graphs
http://genome.cshlp.org/content/22/3/549
237 stars 82 forks source link

Inconsistent format representation for assembly graph edges #160

Open anuradhawick opened 4 years ago

anuradhawick commented 4 years ago

In one of my experiments involving the assembly graph, I noted that ED read2 read1 0 46 50 3 49 50 0 0 lines are not tab-delimited. Only the ED is separated with a tab and everything else uses spaces. This makes the downstream analysis (especially using C++/C) a bit inefficient and would appreciate if it can be corrected as per the definition, or alternatively mention that in the wiki. It took a good amount of time to figure that out just by relying on the format specification. Keep up the good work! Cheers :)

jts commented 4 years ago

This was intentional but probably not a good idea. The coordinates part (0 46 50 3 49 50) is an SGA object that has its own serialization function that uses spaces as a delimiter. Fixing this would be too large of a change and this project is largely deprecated anyway. I suggest you use a GFA variant for your experiment, there are convertors from ASQG to GFA i think.

Jared