Closed erinyoung closed 1 year ago
Hello @erinyoung, thank you very much for your feedback. We are happy to work on this request. I'll give you an update asap
Hi @erinyoung,
I've implemented the function you requested in v.1.3.6. It will count them in the summary, and provide individual results with --seq-report
. Of note, we now have another option --discover-terminal-overlaps
, that will determine perfect (i.e. such as those of a string graph) terminal overlaps in case they are missing.
Let me know what you think.
Best,
Giulio
That sounds spectacular! Thank you!
Hi, Thanks for including this feature. It's quite useful for people working with microbial sequences.
However, I tested this feature on my data and found that the results of circularity from flye and gfastats are not same. I assembled the sequence using flye and the flye output says that both contigs in the output assembly are circular which I can see from .gfa file generated by flye as well, but gfstats says that the contigs are not circular. I am attaching all 3 files here (assembled fasta file (assembly.fasta.txt because github didn't allow .fasta extension files), assembly info from flye & assembly info from gfastats (generated by running this command: gfastats assembly.fasta --seq-report -t > assembly_info_gfastats.txt
)). If I understood correctly, the last column in the 'assembly_info_gfastats.txt' represents the circularity. Let me know If I am understanding anything incorrectly or missing any parameter while running
assembly_info_gfastats.txt assembly_info_flye.txt assembly.fasta.txt
Hi @AmayAgrawal
Thanks for reaching out. Please note that this is not how this is supposed to work. A FASTA file is by definition a linear sequence. A GFA instead can represent circularity as an edge connecting start and end in the graph. Feed the GFA to gfastats and it should be able to tell you that the sequences are indeed circular.
Note from the discussion above that we indeed introduced an option that will try to detect perfect overlaps of a certain length in FASTA files, but this needs to be specified with the --discover-terminal-overlaps N
option. In the case of your file the overlaps are not perfect so it won't work (already tried), but you can see the result by say putting N = 1.
Btw I noticed that the header for circularity in the report was missing (hence your doubt, sorry about that). Fixed in the latest commit :-)
This is a microbial issue, but it would handy if the output for
gfastats
indicated whether or not a sequence is circular.I have attached a gfa that I assembled with flye for an example. I had to change the filename to end with txt so that I could upload it to github, but it is a gfa file. Some tools that I use, such as raven, do not include a summary file. I like
gfastats
because of how useful it is, but it's missing this one key piece of information that would be immensely useful to me, and perhaps other members of the microbial sequencing community.Here's the corresponding assembly_info.txt produced by flye for this sequence.
assembly_graph.gfa.txt