soedinglab / metaeuk

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics
GNU General Public License v3.0
178 stars 23 forks source link

compile from source have some problems #48

Open jamdodot opened 2 years ago

jamdodot commented 2 years ago

Current Behavior

  1. dependencies in cmake satge I still have this shellcheck not found ,and it is difficult to install shellcheck from source code . I wonder if it will affect the compilation if I don't install this image

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

make -j
  1. build target mmseqs-framework image
  2. got error image

Your Environment

Include as many relevant details about the environment you experienced the bug in.

If I want to fix make errors, do I start with cmake or do I modify the cpp source code?

milot-mirdita commented 2 years ago

You can ignore the shellcheck message. I would recommend to restrict the number of processes make launches.

Compiling with make -j 4 or so should work.

jamdodot commented 2 years ago

Yes!!! make -j 4 can work,thank you very much! but i still have dependency problem.

  1. start to test this Binary files image
  2. missing shared library image
  3. search this shared library image

It seems that the compiler is able to provide this file, but my attempts to import this file at the cmake stage failed

  1. link error image

I'm a little confused, do I need to install other software that provides libc++.so.1, or continue testing with this provided by the compiler

milot-mirdita commented 2 years ago

Can you please try to comment out lines 167-171 in this file:

https://github.com/soedinglab/metaeuk/blob/1da320a9daa75dce5539442b5674f69951a2fe4f/lib/mmseqs/CMakeLists.txt#L167

And repeat the compilation with a clean cmake-build folder?

jamdodot commented 2 years ago

Still able to succeed ! Thank you for your enthusiasm and response. 👍 Also I need to learn the reason why commenting out these lines can work. 😃

milot-mirdita commented 2 years ago

Your compiler probably already links its own c++ std lib somehow. Not really sure whats going on.

I am quite interested in your use-case of Metaeuk on ARM. How much RAM does your machine have? Metaeuk will probably struggle on a very low RAM machine.

jamdodot commented 2 years ago

I haven't started using Metaeuk yet, and the current server for building metaeuk is 4vCPUs 8GiB aarch64.

I would really like to test the functionality on ARM 😃

jamdodot commented 2 years ago

Sorry I have a problem with openMP 😭

1. problem

I use the following command to test metaeuk

/metaeuk/build/bin/metaeuk easy-predict /test/test_metaeuk/two_contigs/contigs.fna */test/test_metaeuk/two_contigs/proteins.faa ressxp tmpsxp

and it shows

Calling program has OMP_PROC_BIND set in its environment. Please unset OMP_PROC_BIND.

these message can be find in this link

2.

I try to unset this Environment Variables

unset  OMP_PROC_BIND

But it didn't work.

Do i need to edit the CmakeLists?

milot-mirdita commented 2 years ago

You can disable the check here: https://github.com/soedinglab/metaeuk/blob/1da320a9daa75dce5539442b5674f69951a2fe4f/lib/mmseqs/src/commons/CommandCaller.cpp#L17

milot-mirdita commented 2 years ago

If you see that the cpu utilization is stuck to a load of 1.0 instead of 4.0, then it’s probably some issue with the cpu affinity selection in the openmp implementation of your compiler.

jamdodot commented 1 year ago

Description

When I tested metaeuk, the arm machine I used had 128 cores, while the comparison x86 machine had only 8 (my laptop). At the same time, the number of files generated by the 128-core machine is much more than that of the x86. I wonder if it is because of some tasks that are repeatedly executed that the arm is time-consuming. If the number of cores is the same, is the execution time of the two machines similar? Why are there so many files generated on the arm server

Arm

1 - 副本 3

x86

2 - 副本 4

milot-mirdita commented 1 year ago

Metaeuk generates one output file per thread/core for intermediate result files. The final result files (fasta,gtf,etc.) should be merged and not be split.

jamdodot commented 1 year ago

Do you mean to execute all the contents of run.sh? In this case, arm is still much slower than x86, why is this?

milot-mirdita commented 1 year ago

run.sh is not very representative of MetaEuk's performance and just meant for quick sanity check in our continuous integration. It's much too tiny. I suspect ARM is spending most time in creating/tearing down threads.

jamdodot commented 1 year ago

It makes sense, does it have something to do with openMP? Is there a way to fix this threading problem? or use other independent tests to evaluate performanced 🤷🏻‍♂️

milot-mirdita commented 1 year ago

This is not a problem, this is expected behavior. To benchmark MetaEuk's runtime you need to run it on larger query/target sets.

jamdodot commented 1 year ago

Does this software use mpi for multi-process computing, or just use openMP for multi-thread computing

jamdodot commented 1 year ago

When using mpi to execute multiple processes, the two processes generate the same result file and cause an error to exit. How to solve this problem? can i use mpi for testing

milot-mirdita commented 1 year ago

If you use MPI, only the search stage will be parallelized. This is the most time consuming step usually.

You have to specify the mpi runner and parameters through the RUNNER env variable: https://github.com/soedinglab/mmseqs2#how-to-run-mmseqs2-on-multiple-servers-using-mpi

This should work, is however not very well tested anymore since we now exclusively use high-core count servers ourselves and don't require MPI anymore.

jamdodot commented 1 year ago

Thank you for your reply. Can I directly modify the number of threads used by metaeuk? For example, I have 128 threads and only want to use 8, which is convenient for comparison with x86

milot-mirdita commented 1 year ago

Yes you can specify the --threads parameter.

jamdodot commented 1 year ago

I modified the /tests/test.sh file, but it seems that some commands support this parameter, some are still 128 threads.

some 128 threads

image

some 8 threads

image

test.sh

image

milot-mirdita commented 1 year ago

That sounds like a bug from our side. As a workaround, you can also set the MMSEQS_NUM_THREADS environment variable. That should globally restrict the number of threads:

MMSEQS_NUM_THREADS=8 ./tests/test.sh

jamdodot commented 1 year ago

This works, thanks!

jamdodot commented 1 year ago

This is not a problem, this is expected behavior. To benchmark MetaEuk's runtime you need to run it on larger query/target sets.

Do you have larger query/target sets. I don't know where to find a larger test set and the corresponding commands