Closed koopkaup closed 5 years ago
Thank you for reporting this issue! While I couldn't reproduce the problem myself, I believe I've fixed it in the dev branch. Could you try to compile that branch and confirm that the issue is resolved?
Thank you, Pierre
Compiling now works. Thanks! But I ran into another problem. Converting fasta to bfast format worked normally, but then I tried to do the phylogenetic placement and this error occurred:
terminate called after throwing an instance of 'std::runtime_error'
what(): Invalid substitution rate specification: {/////}+FU{///}+G4{}
[sfr1:69184] *** Process received signal ***
[sfr1:69184] Signal: Aborted (6)
[sfr1:69184] Signal code: (-6)
[sfr1:69184] [ 0] /usr/lib64/libpthread.so.0(+0xf6d0)[0x2b6d0a0d06d0]
[sfr1:69184] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b6d0a313277]
[sfr1:69184] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b6d0a314968]
[sfr1:69184] [ 3] /storage/software/gcc-5.2.0/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x2b6d09689b6d]
[sfr1:69184] [ 4] /storage/software/gcc-5.2.0/lib64/libstdc++.so.6(+0x8cbb6)[0x2b6d09687bb6]
[sfr1:69184] [ 5] /storage/software/gcc-5.2.0/lib64/libstdc++.so.6(+0x8cc01)[0x2b6d09687c01]
[sfr1:69184] [ 6] /storage/software/gcc-5.2.0/lib64/libstdc++.so.6(+0x8ce18)[0x2b6d09687e18]
[sfr1:69184] [ 7] epa-ng(_ZN5raxml5Model15init_model_optsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERK13mixture_model+0x2f9b)[0x4cc4fb]
[sfr1:69184] [ 8] epa-ng(_ZN5raxml5Model16init_from_stringERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1b2)[0x4cdff2]
[sfr1:69184] [ 9] epa-ng(_ZN5raxml5ModelC1ENS_8DataTypeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x14e)[0x4ce39e]
[sfr1:69184] [10] epa-ng(main+0x3560)[0x488f50]
[sfr1:69184] [11] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b6d0a2ff445]
[sfr1:69184] [12] epa-ng[0x4af7fe]
[sfr1:69184] *** End of error message ***
Any idea what might have gone wrong?
Can you post the exact command line call? Looks like you didn't properly specify the model parameters. Try to get them from your reference tree, or of you don't have them you can re-infer them using RAxML with the -f e
option, or using the new raxml-ng with the --evaluate
option.
RAxML will cretae a .info
file, which you can directly pass to epa-ng's --model
option.
see also here: https://github.com/Pbdas/epa-ng#setting-the-model-parameters
I used this method to create my reference tree https://www.polarmicrobes.org/phylogenetic-placement-re-re-visited/
And so I used the .info file from building the reference tree. But now I re-created it with -f e
option and the problem is gone. However, my query sequences were not aligned and caused another error. Can you recommend a MSA aligner with multiprocessor support?
If you want to be thorough, you can try our PaPaRa aligner, which was specifically developed with phylogenetic placement in mind: https://sco.h-its.org/exelixis/web/software/papara/index.html
It uses the reference tree as an additional source of information, and hence yields better alignments for this use case (at an increased computational cost...), as shown in https://doi.org/10.1093/bioinformatics/btr320 as well as http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-5.pdf
Also, I created a simple MPI version of PaPara: https://github.com/lczech/papara_nt
papara works well, I reccommend it.
See also here for a full example of using placement: https://github.com/Pbdas/epa-ng/wiki/Full-Stack-Example
Thanks, I will try that. But two questions. Papara only accepts reference alignment in phylip format, however I cannot convert my reference MSA from fasta to phylip, as my fasta headers are longer than 10 characters and cannot be truncated. Is it possible to use MSA in fasta format for Papara?
And the other question comes from your tutorial.
Papara, for example, outputs the aligned queries together with the reference MSA, in phylip format. (there will be a convenience function for this shortly)
How do you separate queries after aligning against reference MSA?
Thank you for the help!
Yes, this is a downside of PaPaRa - it has tob be phylip for reference and fasta for queries, nothing else works. We are currently starting with a project that will improve PaPaRa, and one of the first things I'll suggest is to allow for more flexible file formats.
Luckily for you, PaPaRa already supports relaxed Phylip, with headers as long as they need to be, and which are instead separated from the sequences by (at least one) space. See here for a converter: https://github.com/npchar/Phylogenomic/blob/master/fasta2relaxedPhylip.pl
I had this fatal error in Papara. I assume that there's something fishy with my reference sequences?
papara: papara.cpp:538: papara::references<pvec_t, seq_tag>::references(const char*, const char*, papara::queries<seq_tag>*) [with pvec_t = pvec_pgap; seq_tag = sequence_model::tag_dna]: Assertion 'unmasked.size() == seq.size()' failed.
The convenience function already exists and is called --split
. I will update the tutorial accordingly. The basic usage is: epa-ng --split ref_alignment query_alignments+
I'm closing this issue as the original bug has been fixed, but we would be happy to help you further on the placement google group (where any errors and solutions become much easier to search for for future users): https://groups.google.com/forum/#!forum/phylogenetic-placement
Re fatal error: Please post this again to our forum that @Pbdas mentioned, and maybe add the first few lines of your alignment.
please also note #20 since you are using the MPI version. Will fix that shortly.
EDIT: that issue is fixed as well, please pull again!
Without MPI everything works fine, but when I enable the EPA_HYBRID=1 then I get this error:
Loaded modules: openmpi-4.0.0 cmake/3.8.2 gcc-5.2.0