nickjcroucher / gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
http://nickjcroucher.github.io/gubbins/
GNU General Public License v2.0
168 stars 49 forks source link

Bipartitions missing when transferring node label transfer #325

Closed annarerra closed 2 years ago

annarerra commented 2 years ago

Hello,

I am running gubbins with an iqtree: _rungubbins.py -s alignment.aln.treefile -t iqtree --bootstrap 1000 --transfer-bootstrap --verbose alignment.aln -c 10

The execution is blocked after the 1st iteration and i have this message: Bipartitions missing when transferring node label transfer: ['000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Do you have an idea why?

Thank you in advance :)

nickjcroucher commented 2 years ago

This works with a test alignment and v3.1.4, so it might require an upgrade - else it could be a problem with processing the starting tree (see #320) - does the command work without the starting tree?

annarerra commented 2 years ago

Thank you for the reply.

the labels of the starting tree are like ABC36456, no # or _ I am running it without the starting tree, it should be fine, but I will let you know.

nickjcroucher commented 2 years ago

Does this still happen with v3.1.4 with/without the starting tree?

annarerra commented 2 years ago

With the version 3.1.3 i had the same error as before without the starting tree. I will ask for the installation of the 3.1.4 from the responsible of the server and i will try again. Thanks again :)

annarerra commented 2 years ago

Version 3.1.4 finally installed!

I am reproducing the iqtree, so for the moment i dont have news for this.

But i run with the same alignment a fasttree and a raxml tree. The raxml seems to be ok, it is still running. Command: _run_gubbins.py -s RAxML_bipartitions.raxmlout -t raxml --bootstrap 100 --transfer-bootstrap --verbose core_genealignment.aln -c 10

The fasttree stopped at the 1st itteration with the same error as for the iqtree: "Bipartitions missing when transferring node label transfer". Command: _run_gubbins.py -s core_gene_alignment.newick -t fasttree --bootstrap 100 --transfer-bootstrap --verbose core_genealignment.aln -c 10

nickjcroucher commented 2 years ago

OK - are you able to share a minimal set that allows this error to be reproduced (e.g. a subset of the alignment)?

annarerra commented 2 years ago

I can send you by a email a subset of alignment.. it looks like this: image or the trees?

nickjcroucher commented 2 years ago

If the problem is observed without the trees, then no need to share them - just a subset of your alignment that repoduces the problem is fine, thanks.

annarerra commented 2 years ago

I just sent you an email.

I think however that this message "Bipartitions missing when transferring node label transfer", has to do with the trees, the raxml has the bipartitions, so there is no problem, but for the fasttree and the iqtree there are no, so I have an error..

nickjcroucher commented 2 years ago

So this command works fine for me: run_gubbins.py -t fasttree --bootstrap 10 -c 1 -i 2 --transfer-bootstrap core_gene_alignment_sub.aln

As did this one: run_gubbins.py -t fasttree --bootstrap 50 -c 3 -i 2 --transfer-bootstrap core_gene_alignment_sub.aln

So it seems to be OK with the multi-threading and use of multiple cores - are you able to check the gubbins version (run_gubbins.py --version) and Fasttree version? What results do you get using the commands above?

annarerra commented 2 years ago

gubbins version: 3.1.4 I produce fasttree through roary (version 3.13.0): FastTree –nt –gtr core_gene_alignment.aln > my_tree.newick

I tried the first command and it works for me, so probably its better not to use a starting tree.

nickjcroucher commented 2 years ago

Important note: you can't use the core alignment from Roary as an input for Gubbins; see the manual. The new versions come with a script for generating a whole genome aligment using SKA, and Snippy has instructions for generating a similar file.

annarerra commented 2 years ago

I am sorry for not replying back, but i had some time off.

Why the alignment from roary cannot be used?

nickjcroucher commented 2 years ago

Please see the introduction to the manual for the reasons why (they're in the Introduction, no need to slog through the whole thing), and for help on how to generate a suitable whole genome alignment.

nickjcroucher commented 2 years ago

I've added a tutorial to help with the generation of a whole genome alignment: https://github.com/nickjcroucher/gubbins/blob/master/docs/gubbins_tutorial.md.

annarerra commented 2 years ago

thanks a lot!! i will take a look! i followed already the instructions from snippy: % snippy-clean_full_aln core.full.aln > clean.full.aln % run_gubbins.py -p gubbins clean.full.aln

and i am watiing the result from gubbins..

annarerra commented 2 years ago

Hello,

It seems ok now. A screenshot from phandango. The tree is big and not so visible here. image

nickjcroucher commented 2 years ago

Great! Looks like something odd with the long branch in the middle of that tree - do you expect that to be there? If not, I can take a look at a handful of sequences (a couple from either clade) to see if there's anything odd happening with the reconstruction

annarerra commented 2 years ago

Not sure if that should be there or not :/ I am working with in house K.pneumoniae strains with resistance to an antibiotic.

U can find attached the log file, maybe there is something there that I missed. log.txt

mesti90 commented 2 years ago

I have the very same problem with another dataset. Did you find any solution for that? I use gubbins 3.2.1 with the following command: run_gubbins.py --model-fitter fasttree --threads 50 --iterations 5 gubbins_input_1000_seq.fasta -v -u

nickjcroucher commented 2 years ago

What is the error message you receive?

mesti90 commented 2 years ago

The error message is: Bipartitions missing when transferring node label transfer: ['...','...'] The error occurs after FastTree finishes the tree building, when the software tries to harmonise the rooted tree and the treefile (I found that the problem occurs on line 821 in common.py)

nickjcroucher commented 2 years ago

Thanks - have you got a small dataset on which you are able to reproduce the error that you are able to share with me?

nicolettacommins commented 2 years ago

Hi there, chiming in since I see this is active now. I'm getting the same error right now using v3.2.1. I'm using an alignment that I have successfully run Gubbins on previously but using v2.4.1. However I'm wanting to use some of the newer features that aren't available in 2.4.1 but am getting the "Bipartitions missing when transferring node label transfer" error. I'm able to get it to run on a smaller alignment that has a subset of my larger dataset but it doesn't like this particular tree for some reason.

mesti90 commented 2 years ago

I can send you an alignment of 500 sequences for which I receive this error, but I can still reduce the sample size, if necessary. (This file is around 0.5GB)

nickjcroucher commented 2 years ago

If there's a smaller number that gives the same error, that would be ideal, else if you can make the alignment of 500 available somewhere, I can work with that.

nickjcroucher commented 2 years ago

@nicolettacommins whst command did you use?

mesti90 commented 2 years ago

@nickjcroucher I uploaded the file with 500 sequences to Google Drive: The command I used is run_gubbins.py --model-fitter fasttree --threads 80 --iterations 5 gubbins_input_1000_seq_1-500.fasta -v -u https://drive.google.com/file/d/1HJoLWhkG2JceCmenW-cs4GG-xBNz2JRW/view?usp=sharing

nickjcroucher commented 2 years ago

The problem is caused by model fitting with FastTree2 - this is meant to optimise branch lengths without changing the tree topology, but it is stated here:

Why does -intree -nome -mllen change the tree topology in rare cases? If your alignment contains identical sequences and the input tree places these identical sequences in different locations in the tree, then the output tree will not match the input tree. The reason for this is that FastTree does not represent the duplicate sequences in its internal representation of the tree, so it has no way of "remembering" that they belong in different places. It should issue a warning in this case, but it does not -- it simply uses one of the locations for all of the identical sequences.

Gubbins expects the tree to come back with an identical topology, hence the error message. I would recommend you use a different model fitting algorithm (RAxML or IQtree) - you can still build the tree with Fasttree if that is your preference.

I will add a warning about this in the code.

nickjcroucher commented 2 years ago

Now fixed in ab18e3d, will be corrected in v3.2.2 - reopen this if there are further issues.

mesti90 commented 2 years ago

Thank you, I'll check the solution