mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
149 stars 27 forks source link

key error during breakpoint graph building #6

Closed stsmall closed 7 years ago

stsmall commented 9 years ago

Hi, I keep getting an "KeyError" while running ragout.

the command line: ragout.py wolbachia.rcp --repeats

the rcp file: .references = wBmalayi, wOochengi, wOvolvulus .target = wWbancrofti wBmalayi.fasta = references/wBmalayi.fasta wOochengi.fasta = references/wOochengi.fasta wOvolvulus.fasta = references/wOvolvulus.fasta wWbancrofti.fasta = wWb.pt22.spades.fasta *.circular = true .blocks = small

the error message: [15:38:48] INFO: Cooking Ragout... [15:38:48] INFO: Running Sibelia with block size 5000 [15:39:05] INFO: Running Sibelia with block size 500 [15:39:22] INFO: Running Sibelia with block size 100 [15:39:38] INFO: Inferring phylogeny from synteny blocks data [15:39:38] INFO: (((wBmalayi : 14.5, wWbancrofti : 14.5) : 64.5, wOvolvulus : 4.0) : 1.0, wOochengi : 1.0) [15:39:38] INFO: Running Ragout with the block size 5000 [15:39:38] INFO: Resolving breakpoint graph Traceback (most recent call last): File "/cm/shared/apps/Ragout/1.1/ragout.py", line 32, in sys.exit(main()) File "/cm/shared/apps/Ragout/1.1/ragout/main.py", line 236, in main return run_ragout(args) File "/cm/shared/apps/Ragout/1.1/ragout/main.py", line 87, in run_ragout run_unsafe(args) File "/cm/shared/apps/Ragout/1.1/ragout/main.py", line 165, in run_unsafe adjacencies = graph.find_adjacencies(phylogeny) File "/cm/shared/apps/Ragout/1.1/ragout/breakpoint_graph/breakpoint_graph.py", line 98, in find_adjacencies chosen_edges.extend(self._process_component(subgraph, phylogeny)) File "/cm/shared/apps/Ragout/1.1/ragout/breakpoint_graph/breakpoint_graph.py", line 136, in _process_component weighted_graph = self._make_weighted(subgraph, phylogeny) File "/cm/shared/apps/Ragout/1.1/ragout/breakpoint_graph/breakpoint_graph.py", line 215, in _make_weighted break_weight = phylogeny.estimate_tree(adjacencies) File "/cm/shared/apps/Ragout/1.1/ragout/phylogeny/phylogeny.py", line 100, in estimate_tree return min(rec_helper(self.tree).values()) File "/cm/shared/apps/Ragout/1.1/ragout/phylogeny/phylogeny.py", line 85, in rec_helper nodes_scores[node] = rec_helper(node) File "/cm/shared/apps/Ragout/1.1/ragout/phylogeny/phylogeny.py", line 85, in rec_helper nodes_scores[node] = rec_helper(node) File "/cm/shared/apps/Ragout/1.1/ragout/phylogeny/phylogeny.py", line 81, in rec_helper return {s : leaf_score(s) for s in all_states} File "/cm/shared/apps/Ragout/1.1/ragout/phylogeny/phylogeny.py", line 81, in return {s : leaf_score(s) for s in all_states} File "/cm/shared/apps/Ragout/1.1/ragout/phylogeny/phylogeny.py", line 79, in leaf_score = (lambda s: 0.0 if s == leaf_states[root.identifier] KeyError: 'wOvolvulus'

mikolmogorov commented 9 years ago

It looks like a bug, I will take a look on it shortly.

mikolmogorov commented 9 years ago

This looks really strange for me.. Could you please share "ragout.log" file generated in output folder? It provides some extra debugging information.

stsmall commented 9 years ago

Sorry for the delay. Find the ragout.log below.

[12:45:56] root: INFO: Cooking Ragout... [12:45:56] root: INFO: Running Sibelia with block size 5000 [12:46:13] root: INFO: Running Sibelia with block size 500 [12:46:30] root: INFO: Running Sibelia with block size 100 [12:46:46] root: INFO: Inferring phylogeny from synteny blocks data [12:46:46] root: DEBUG: Reading permutation file [12:46:46] root: DEBUG: "wOvolvulus" synteny blocks coverage: 99.76% [12:46:46] root: DEBUG: "wWbancrofti" synteny blocks coverage: 97.05% [12:46:46] root: DEBUG: "wOochengi" synteny blocks coverage: 99.91% [12:46:46] root: DEBUG: "wBmalayi" synteny blocks coverage: 97.23% [12:46:46] root: DEBUG: Read 3 reference sequences [12:46:46] root: DEBUG: Read 154 target sequences [12:46:46] root: DEBUG: 100 target sequences left after repeat filtering [12:46:46] root: DEBUG: 0 contigs were marked as chimeric [12:46:46] root: DEBUG: Branch lengths: [1.0, 64.5, 14.5, 14.5, 4.0, 1.0], mu = 0.25 [12:46:46] root: INFO: (((wBmalayi : 14.5, wWbancrofti : 14.5) : 64.5, wOvolvulus : 4.0) : 1.0, wOochengi : 1.0) [12:46:46] root: INFO: Running Ragout with the block size 5000 [12:46:46] root: DEBUG: Reading permutation file [12:46:46] root: DEBUG: "wOvolvulus" synteny blocks coverage: 99.5% [12:46:46] root: DEBUG: "wWbancrofti" synteny blocks coverage: 96.74% [12:46:46] root: DEBUG: "wOochengi" synteny blocks coverage: 99.7% [12:46:46] root: DEBUG: "wBmalayi" synteny blocks coverage: 86.52% [12:46:46] root: DEBUG: Read 3 reference sequences [12:46:46] root: DEBUG: Read 57 target sequences [12:46:46] root: INFO: Resolving repeats [12:46:46] root: DEBUG: Resolved 0 unique repeat instances [12:46:46] root: DEBUG: Added 0 extra contigs [12:46:46] root: DEBUG: 57 target sequences left after repeat filtering [12:46:46] root: DEBUG: 0 contigs were marked as chimeric [12:46:46] root: DEBUG: Building breakpoint graph [12:46:46] root: DEBUG: Built graph with 132 nodes [12:46:46] root: INFO: Resolving breakpoint graph [12:46:46] root: DEBUG: Found 60 connected components

mikolmogorov commented 9 years ago

Thanks a lot! Actually, I do not see anything suspicious in the log... Do you think you can share a folder with intermediate data (named "sibelia-workdir")? It contains synteny block permutations of all genomes, but not their sequences. This would allow me to quickly reproduce the bug and fix it.