pierrebarbera / epa-ng

Massively parallel phylogenetic placement of genetic sequences
GNU Affero General Public License v3.0
77 stars 7 forks source link

Error when compiling MPI enabled version #19

Closed koopkaup closed 5 years ago

koopkaup commented 5 years ago

Without MPI everything works fine, but when I enable the EPA_HYBRID=1 then I get this error:

[ 90%] Building CXX object src/CMakeFiles/epa_module.dir/net/epa_mpi_util.cpp.o
In file included from /gpfs/hpchome/a51256/epa/src/net/epa_mpi_util.cpp:1:0:
/gpfs/hpchome/a51256/epa/src/net/epa_mpi_util.hpp: In function ‘void epa_mpi_send(T&, int, MPI_Comm)’:
/gpfs/hpchome/a51256/epa/src/net/epa_mpi_util.hpp:110:58: error: there are no arguments to ‘memcpy’ that depend on a template parameter, so a declaration of ‘memcpy’ must be available [-fpermissive]
   memcpy(buffer, data.c_str(), data.size() * sizeof(char));
                                                          ^
/gpfs/hpchome/a51256/epa/src/net/epa_mpi_util.hpp:110:58: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
/gpfs/hpchome/a51256/epa/src/net/epa_mpi_util.hpp: In function ‘void epa_mpi_isend(T&, int, MPI_Comm, request_tuple&, Timer<>&)’:
/gpfs/hpchome/a51256/epa/src/net/epa_mpi_util.hpp:148:58: error: there are no arguments to ‘memcpy’ that depend on a template parameter, so a declaration of ‘memcpy’ must be available [-fpermissive]
   memcpy(buffer, data.c_str(), data.size() * sizeof(char));
                                                          ^
make[3]: *** [src/CMakeFiles/epa_module.dir/net/epa_mpi_util.cpp.o] Error 1

Loaded modules: openmpi-4.0.0 cmake/3.8.2 gcc-5.2.0

pierrebarbera commented 5 years ago

Thank you for reporting this issue! While I couldn't reproduce the problem myself, I believe I've fixed it in the dev branch. Could you try to compile that branch and confirm that the issue is resolved?

Thank you, Pierre

koopkaup commented 5 years ago

Compiling now works. Thanks! But I ran into another problem. Converting fasta to bfast format worked normally, but then I tried to do the phylogenetic placement and this error occurred:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid substitution rate specification: {/////}+FU{///}+G4{}
[sfr1:69184] *** Process received signal ***
[sfr1:69184] Signal: Aborted (6)
[sfr1:69184] Signal code:  (-6)
[sfr1:69184] [ 0] /usr/lib64/libpthread.so.0(+0xf6d0)[0x2b6d0a0d06d0]
[sfr1:69184] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b6d0a313277]
[sfr1:69184] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b6d0a314968]
[sfr1:69184] [ 3] /storage/software/gcc-5.2.0/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x2b6d09689b6d]
[sfr1:69184] [ 4] /storage/software/gcc-5.2.0/lib64/libstdc++.so.6(+0x8cbb6)[0x2b6d09687bb6]
[sfr1:69184] [ 5] /storage/software/gcc-5.2.0/lib64/libstdc++.so.6(+0x8cc01)[0x2b6d09687c01]
[sfr1:69184] [ 6] /storage/software/gcc-5.2.0/lib64/libstdc++.so.6(+0x8ce18)[0x2b6d09687e18]
[sfr1:69184] [ 7] epa-ng(_ZN5raxml5Model15init_model_optsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERK13mixture_model+0x2f9b)[0x4cc4fb]
[sfr1:69184] [ 8] epa-ng(_ZN5raxml5Model16init_from_stringERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1b2)[0x4cdff2]
[sfr1:69184] [ 9] epa-ng(_ZN5raxml5ModelC1ENS_8DataTypeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x14e)[0x4ce39e]
[sfr1:69184] [10] epa-ng(main+0x3560)[0x488f50]
[sfr1:69184] [11] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b6d0a2ff445]
[sfr1:69184] [12] epa-ng[0x4af7fe]
[sfr1:69184] *** End of error message ***

Any idea what might have gone wrong?

pierrebarbera commented 5 years ago

Can you post the exact command line call? Looks like you didn't properly specify the model parameters. Try to get them from your reference tree, or of you don't have them you can re-infer them using RAxML with the -f e option, or using the new raxml-ng with the --evaluate option.

RAxML will cretae a .info file, which you can directly pass to epa-ng's --model option.

pierrebarbera commented 5 years ago

see also here: https://github.com/Pbdas/epa-ng#setting-the-model-parameters

koopkaup commented 5 years ago

I used this method to create my reference tree https://www.polarmicrobes.org/phylogenetic-placement-re-re-visited/ And so I used the .info file from building the reference tree. But now I re-created it with -f e option and the problem is gone. However, my query sequences were not aligned and caused another error. Can you recommend a MSA aligner with multiprocessor support?

lczech commented 5 years ago

If you want to be thorough, you can try our PaPaRa aligner, which was specifically developed with phylogenetic placement in mind: https://sco.h-its.org/exelixis/web/software/papara/index.html

It uses the reference tree as an additional source of information, and hence yields better alignments for this use case (at an increased computational cost...), as shown in https://doi.org/10.1093/bioinformatics/btr320 as well as http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-5.pdf

Also, I created a simple MPI version of PaPara: https://github.com/lczech/papara_nt

pierrebarbera commented 5 years ago

papara works well, I reccommend it.

See also here for a full example of using placement: https://github.com/Pbdas/epa-ng/wiki/Full-Stack-Example

koopkaup commented 5 years ago

Thanks, I will try that. But two questions. Papara only accepts reference alignment in phylip format, however I cannot convert my reference MSA from fasta to phylip, as my fasta headers are longer than 10 characters and cannot be truncated. Is it possible to use MSA in fasta format for Papara?

And the other question comes from your tutorial. Papara, for example, outputs the aligned queries together with the reference MSA, in phylip format. (there will be a convenience function for this shortly) How do you separate queries after aligning against reference MSA?

Thank you for the help!

lczech commented 5 years ago

Yes, this is a downside of PaPaRa - it has tob be phylip for reference and fasta for queries, nothing else works. We are currently starting with a project that will improve PaPaRa, and one of the first things I'll suggest is to allow for more flexible file formats.

Luckily for you, PaPaRa already supports relaxed Phylip, with headers as long as they need to be, and which are instead separated from the sequences by (at least one) space. See here for a converter: https://github.com/npchar/Phylogenomic/blob/master/fasta2relaxedPhylip.pl

koopkaup commented 5 years ago

I had this fatal error in Papara. I assume that there's something fishy with my reference sequences? papara: papara.cpp:538: papara::references<pvec_t, seq_tag>::references(const char*, const char*, papara::queries<seq_tag>*) [with pvec_t = pvec_pgap; seq_tag = sequence_model::tag_dna]: Assertion 'unmasked.size() == seq.size()' failed.

pierrebarbera commented 5 years ago

The convenience function already exists and is called --split. I will update the tutorial accordingly. The basic usage is: epa-ng --split ref_alignment query_alignments+

I'm closing this issue as the original bug has been fixed, but we would be happy to help you further on the placement google group (where any errors and solutions become much easier to search for for future users): https://groups.google.com/forum/#!forum/phylogenetic-placement

lczech commented 5 years ago

Re fatal error: Please post this again to our forum that @Pbdas mentioned, and maybe add the first few lines of your alignment.

pierrebarbera commented 5 years ago

please also note #20 since you are using the MPI version. Will fix that shortly.

EDIT: that issue is fixed as well, please pull again!