rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
567 stars 131 forks source link

Graph file from the Flye assembler seems to cause error when used for "pilon_polish.py" #165

Open ilnamkang opened 5 years ago

ilnamkang commented 5 years ago

Hi,

I'm trying to improve assembly quality of my genomes by applying Unicycler to the output of the Flye assembler (https://github.com/fenderglass/Flye).

I've tried two different ways, using the graph file from the Flye assembler (assembly_graph.gfa). (1) Using the "--existing_long_read_assembly" option when running unicycler (2) Using the graph file as an input for the "pilon_polish.py" script

The first approach (using "--existing_long_read_assembly" option) was successful and produced a little bit improved assembly without any errors.

But, the second approach (using "pilon_polish.py" script) caused an error just after dependency check. The error message is attached below.


Traceback (most recent call last): File "/home/memb-main/Unicycler/scripts/pilon_polish.py", line 224, in main() File "/home/memb-main/Unicycler/scripts/pilon_polish.py", line 41, in main graph = unicycler.assembly_graph.AssemblyGraph(args.input, None) File "/usr/local/lib/python3.4/dist-packages/unicycler/assembly_graph.py", line 63, in init self.load_from_gfa(filename) File "/usr/local/lib/python3.4/dist-packages/unicycler/assembly_graph.py", line 116, in load_from_gfa num = int(line_parts[1]) ValueError: invalid literal for int() with base 10: 'contig_1'

How can I avoid this error?

Thanks.

ehelegam commented 5 years ago

The Python code to load GFA files (see below) requires that the segment names are integers. I solved this issue by replacing original GFA's segment names (i.e. 'edge_1' --> '1'). Good luck.

def load_from_gfa(self, filename): """ Loads a Graph from a GFA file. It does not load any GFA file, but makes some restrictions: 1) The segment names must be integers. 2) The depths should be stored in a dp tag. 3) All link overlaps are the same (equal to the graph overlap value). """