maxplanck-ie / HiCAssembler

Software to assemble contigs/scaffolds into chromosomes using Hi-C data
27 stars 4 forks source link

Assemble pipeline crashes on iteration 0, "Graph object has no attribute 'node'" #20

Open ens-LCampbell opened 3 years ago

ens-LCampbell commented 3 years ago

Hi there,

Could anyone shed some light on any potential issues that might result in 'assemble' crashing with the following error stack:

DEBUG:HiCAssembler:iteration: 0 N50: 8,770,000 DEBUG:HiCAssembler:Merging small bins in larger bins of size 100316 bp INFO:hicexplorer.iterativeCorrection:starting iterative correction INFO:Scaffolds:Computing stats per distance DEBUG:HiCAssembler:Confidence score set to 1499.885908354667 INFO:Scaffolds:Entering join_paths_max_span_tree INFO:Scaffolds:190 hubs were found Traceback (most recent call last): File "./HiCAssembler/bin/assemble", line 312, in main(args) File "./HiCAssembler/bin/assemble", line 306, in main super_contigs = assembl.assemble_contigs() File "/homes/lcampbell/hps_nobackup2_lcampbell/Software/miniconda2/lib/python3.7/site-packages/hicassembler/HiCAssembler.py", line 202, in assemble_contigs hub_solving_method='remove weakest') File "/homes/lcampbell/hps_nobackup2_lcampbell/Software/miniconda2/lib/python3.7/site-packages/hicassembler/Scaffolds.py", line 28, in wrapper f_result = func(*args, **kwds) File "/homes/lcampbell/hps_nobackup2_lcampbell/Software/miniconda2/lib/python3.7/site-packages/hicassembler/Scaffolds.py", line 1815, in join_paths_max_span_tree self._remove_weakest(nxG) File "/homes/lcampbell/hps_nobackup2_lcampbell/Software/miniconda2/lib/python3.7/site-packages/hicassembler/Scaffolds.py", line 1966, in _remove_weakest node_degree_mst = dict(G.degree(G.node.keys())) AttributeError: 'Graph' object has no attribute 'node'

HiCAssembler was installed on our cluster by using the following approach:

git checkout -b 2to3 --track origin/2to3 And editing the setup.py, by commenting out the line: "#package_data={'': '*.txt'},"

I figured if I could at least run the example data, that might shed some light. BUT NO....

In addition, how is it possible to run even the test data through assemble ? If assemble complains it has not been provided with a fasta, of which I can not locate any associated fasta file for the example data 'hic_small.h5'.

Proposed command to run example data: "assemble -m Hi_C_matrix_corrected.h5 -o ./assembly_output \ --min_scaffold_length 100000 --bin_size 5000 --misassembly_zscore_threshold -1.0 \ --num_iterations 3 --num_processors 16"

Results of running the above give the error: assemble: error: the following arguments are required: --fasta/-f !

What gives ? Is the fasta file, an output file to capture scaffolded assembly, or input of the original assembly TO BE scaffolded. Its not entirely clear in the help documentation.

StefanoLonardi commented 3 years ago

I was able to fix the code. First the --fasta file is required, it is the assembly that you are trying to scaffold.

Regarding the code, according to https://networkx.org/documentation/networkx-2.3/release/release_2.0.html G.node has been replaced by G.nodes in the new version of networkx

I had to manually change all G.node to G.nodes in HiCAssembler.py and Scaffolds.py

Finally, the function nx.connected_component_subgraphs is not available anymore, but it can be replaced by this function that needs to be included in both HiCAssembler.py and Scaffolds.py

def connected_component_subgraphs(G):
    for c in nx.connected_components(G):
        yield G.subgraph(c)

After all these changes, the code worked for me. It did not generate a good assembly though.