Closed lczech closed 3 years ago
It is a known issue that AppleClang on MacOSX does not work well with OpenMP. As I currently do not have access to a Mac, I cannot test this, but I'll try to provide potential solutions:
Install gappa via conda, instead of compiling on your own: https://anaconda.org/bioconda/gappa I don't think that this version uses OpenMP, so it will be slower for some commands - read on if you want/need speed.
Instead of AppleClang, use "proper" clang:
# Install the llvm package, which includes clang.
brew install llvm libomp
# We need to set custom paths here, so that the new Clang is used.
export PATH="$(brew --prefix llvm)/bin:$PATH";
export COMPILER=/usr/local/opt/llvm/bin/clang++
export CFLAGS="-I /usr/local/include -I/usr/local/opt/llvm/include"
export CXXFLAGS="-I /usr/local/include -I/usr/local/opt/llvm/include"
export LDFLAGS="-L /usr/local/lib -L/usr/local/opt/llvm/lib"
export CXX=${COMPILER}
# Bugfix for https://github.com/Homebrew/homebrew-core/issues/52461-mlinker-version=450
# Otherwise, we get "ld: unknown option: -platform_version"
# Not needed for all MacOS versions, apparently.
# See if that works for you, and otherwise, leave out these two lines.
export CXXFLAGS="${CXXFLAGS} -mlinker-version=450"
export LDFLAGS="${LDFLAGS} -mlinker-version=450"
# Now go to the genesis main directory, clean up, and build again.
cd wherever/you/have/stored/gappa
make clean
make
I am not entirely sure that the extra libomp
is needed in the package installation step, but it also doesn't hurt to install this (unless that package is not being found, in which case you can try to just brew install llvm
instead). You might have to run an update on the machine first.
Turn off OpenMP. This is not recommended, as this will result in slower execution times. But if your dataset is small enough to be analyzed on a Mac laptop, it is probably also small enough to not need OpenMP anyway.
# We need to invoke CMake directly, telling it to not look for OpenMP.
# Go the the genesis main directory, and execute:
cd wherever/you/have/stored/gappa
make clean
mkdir build
cd build
cmake -DGENESIS_USE_OPENMP=OFF ..
make
This will result in a warning, which you can ignore, since we deactivated OpenMP on purpuse.
Hope that helps! Let me know here about your progress!
Thank you very much! The 2nd solution that you posted recommending the use of "proper" clang instead of AppleClang fixed the problem.
Though we have now run into an additional problem that may or may not be related. I am posting the code below in case it is related or in case you have any suggestions that could help. The program paprica is only creating half of the files that it is supposed to (which is an improvement from before when it didn't make any).
Thank you again for your time and help with this, we appreciate it very much!!!
Gabriels-MacBook-Air:paprica gabrielprice$ ./paprica-run.sh test bacteria
# cmalign :: align sequences to a CM
# INFERNAL 1.1.1 (July 2014)
# Copyright (C) 2014 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# CM file: /Users/gabrielprice/paprica/models/bacteria_ssu.cm
# sequence file: /Users/gabrielprice/paprica/test.bacteria.clean.unique.fasta
# CM name: SSU_rRNA_bacteria
# saving alignment to file: /Users/gabrielprice/paprica/test.bacteria.clean.unique.align.sto
# output alignment format specified as: Pfam
# output alignment alphabet: DNA
# number of worker threads: 4
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# running time (s)
# -------------------------------
# idx seq name length cm from cm to trunc bit sc avg pp band calc alignment total mem (Mb)
# --- ------------ ------ ------- ------- ----- -------- ------ --------- --------- --------- --------
1 SRR953432.50 84 968 1054 3' 39.93 0.974 0.03 0.04 0.07 2.91
2 Firmicutes_1 252 524 776 5'&3' 231.71 0.997 0.08 0.02 0.10 5.77
3 Firmicutes_2 253 524 776 5'&3' 238.32 1.000 0.08 0.02 0.10 5.83
#
# CPU time: 0.41u 0.05s 00:00:00.46 Elapsed: 00:00:00.32
# Saving alignment to file /Users/gabrielprice/paprica/test.bacteria/test.bacteria.combined_16S.bacteria.tax.phylum_reps.clean.unique.align.sto ... done
#
# CPU time: 0.00u 0.01s 00:00:00.01 Elapsed: 00:00:00.02
INFO Selected: Output dir: /Users/gabrielprice/paprica/test.bacteria/
INFO Selected: Query file: /Users/gabrielprice/paprica/test.bacteria/test.bacteria.clean.unique.align.newlength.fasta
INFO Selected: Tree file: /Users/gabrielprice/paprica/ref_genome_database/bacteria/phylum_reps/combined_16S.23S.bacteria.tax.phylum_reps.f. inal.bestTree
INFO Selected: Reference MSA: /Users/gabrielprice/paprica/test.bacteria/combined_16S.bacteria.tax.phylum_reps.clean.align.newlength.fasta
INFO Selected: Automatic switching of use of per rate scalers
INFO Selected: Preserving the root of the input tree
INFO Selected: Specified model file: /Users/gabrielprice/paprica/ref_genome_database/bacteria/phylum_reps/combined_16S.23S.bacteria.tax.phylum_reps.f. inal.bestModel
INFO Rate heterogeneity: GAMMA (4 cats, mean), alpha: 0.469537 (user), weights&rates: (0.25,0.0278902) (0.25,0.231564) (0.25,0.796771) (0.25,2.94377)
Base frequencies (user): 0.220039 0.236043 0.303232 0.240686
Substitution rates (user): 0.893602 2.60232 1.46235 0.917758 3.56951 1
INFO ______ ____ ___ _ __ ______
/ ____// __ \ / | / | / // ____/
/ __/ / /_/ // /| | ______ / |/ // / __
/ /___ / ____// ___ |/_____// /| // /_/ /
/_____//_/ /_/ |_| /_/ |_/ \____/ (v0.3.6)
INFO Output file: /Users/gabrielprice/paprica/test.bacteria/epa_result.jplace
INFO 3 Sequences done!
INFO Time spent placing: 0s
INFO Elapsed Time: 0s
.... ....
'' '||. .||'
|| ||
'|.|'
...' .... ... ... ... ... .... .|'|.
| || '' .|| ||' || ||' || '' .|| .|' ||
|'' .|' || || | || | .|' || .|'|. ||
'.... '|..'|'. ||...' ||...' '|..'|. '||' ||:.
'....' || ||
'''' '''' v0.6.1 (c) 2017-2020
by Lucas Czech and Pierre Barbera
Invocation: gappa examine edpl --allow-file-overwriting --out-dir /Users/gabrielprice/paprica/test.bacteria/ --file-prefix test.bacteria.phylum_reps.edpl --jplace-path
/Users/gabrielprice/paprica/test.bacteria/test.bacteria.phylum_reps.jplace
Command: gappa examine edpl
Input:
--jplace-path /Users/gabrielprice/paprica/test.bacteria/test.bacteria.phylum_reps.jplace
Settings:
--histogram-bins 25
--histogram-max -1
--no-list-file false
Output:
--out-dir /Users/gabrielprice/paprica/test.bacteria/
--file-prefix test.bacteria.phylum_reps.edpl
--file-suffix
Global Options:
--allow-file-overwriting true
--verbose false
--threads 2
--log-file
Run the following command to get the references that need to be cited:
`gappa tools citation Czech2020-genesis-and-gappa Matsen2011-edgepca-and-squash-clustering`
Started 2021-01-28 21:31:54
Found 1 jplace file
Writing output files.
Finished 2021-01-28 21:31:54
The following arguments were not expected: test.bacteria.phylum_reps --tree-file-prefix
Run with --help for more information.
Traceback (most recent call last):
File "/Users/gabrielprice/paprica/paprica-place_it.py", line 1340, in <module>
edpl = pd.read_csv(temp_dir + query + '.' + phylum_ref + '.edpllist.csv', index_col = 1)
File "/Library/Python/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Python/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/Library/Python/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/Library/Python/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Library/Python/miniconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1906, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 380, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 687, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'/Users/gabrielprice/paprica/test.bacteria/test.bacteria.phylum_reps.edpllist.csv' does not exist: b'/Users/gabrielprice/paprica/test.bacteria/test.bacteria.phylum_reps.edpllist.csv'
Thanks for the feedback, I'm glad that the preferred solution that uses OpenMP worked!
As for paprica: I have never heard of that tool, interesting. As the error seems to be on their end, this is an issue that they need to fix. Please open up another issue at their issue page and report this there! You can link to this issue here, in case that helps them figuring out what is going on.
Here are a few hints though that you can forward to them: In your error message above, there is the message
The following arguments were not expected: test.bacteria.phylum_reps --tree-file-prefix
The function that is being called there, gappa examine heat-tree
, recently changed its command line slightly (see here for the current usage), and went from the option --tree-file-prefix
in gappa v0.6.1 (which you are using, so it should work) to just --file-prefix
in gappa v0.7.0. If you or paprica update to that gappa version, this will need to be changed.
However, this does not seem to be connected to the actual error FileNotFoundError
(last line in your log). That simply looks like some file paths are wrongly set by paprica. Ask them, and I hope they can help!
I'll close this issue for now, but feel free to comment or re-open as needed.
The following issue was sent to me via email. I am posting it here for future users that might run into the same issue.
Compilation fails with AppleClang on MacOSX due to missing
omp.h
although OpenMP is reported as being used: