rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

Segmentation fault caused by building ME tree #77

Closed georgiesamaha closed 1 month ago

georgiesamaha commented 2 months ago

Hi @rrwick,

We've been a bit cheeky and are implementing Trycycler in a Nextflow pipeline, as its such a great tool!

We ran into the previously reported issue with some of our bigger samples:

*** caught segfault *** 
address 0x38, cause 'memory not mapped' 

So we attempted to resolve with your suggestions to reinstall with an updated version of R, reinstalling ape, and using ape's bionj function instead. When my colleague reinstalled trycycler and updated ape an phangorn he got this error:

Error in fastme.bal(distances) :  
    cannot build ME tree with less than 3 observations 

When he replaced fastme.bal with bionj in cluster.py create_tree_script function, he got this error:

Error in bionj(distances) :  
    cannot build a BIONJ tree with less than 3 observations 

We reflected on these errors and the fact that all necessary output is created and appears to be correct as far as we can tell. Our workaround at this point it simply install Trycycler with build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers) in cluster.py hashed out.

Are we correct in thinking the build_tree step in cluster.py is only required for visualisations if desire, not for the completion of the clustering step, and is not used in any subsequent steps?

rrwick commented 2 months ago

That's correct, the tree is only for visualisation, so you could comment that line out. However, I do find it to be a particularly useful visualisation, and when I run Trycycler, I usually look at the tree to assess which clusters I will or won't keep.

My main question is why you would have fewer than three contigs your tree. A typical Trycycler run would have ~12 input assemblies, so that would give at least 12 contigs. Do you actually have so few input contigs? How many sequences are in your contigs.phylip file?

georgiesamaha commented 1 month ago

Good point @rrwick, we're working with too few contigs for some samples. Closing this ticket as we're revising our method.