Closed nextgenusfs closed 5 years ago
We started something for FGMP and the problems with Augustus 3.3 on bioconda also a problem.
Maybe we can tweet about this some of the bioconda devs are perhaps able to contribute suggestions.
Jason Stajich, PhD jasonstajich.phd@gmail.com On Jul 6, 2018, 12:11 PM -0700, Jon Palmer notifications@github.com, wrote:
I would like to make a bioconda package for funannotate -- as it is still difficult to install. Would be great if a conda expert would be willing to help out. So far there are some dependencies still missing would need to get done first.
- Evidence Modeler is not in bioconda
- Trinity and PASA are currently Linux only (not available for macOS) -- seems to be a compiler incompatibility?
- Augustus via bioconda is not fully functional on macOS -- I use a slightly modified version of augustus v3.2.1
- funannotate code is currently python2 only (needs to be migrated to py2/3 compatible) -- note this is perhaps not required for a recipe but would help with future compatibility.
Any help/guidance would be appreciated! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hi @nextgenusfs @hyphaltip ,
I'm just discovering funannotate
now and am impressed; thanks for the work!
I helped commit the OSX augustus
bioconda build (about 5 months ago) and wasn't aware it was breaking - do you have an issue to point me to for more of a failure log?
I see in reading the notes from your custom macOS augustus build that it appears to be a compiler compatibility problem with bamtools, causing the proteinprofile BUSCO search to fail; is that it? Maybe we could add a test directly to the augustus recipe to catch that if handling both bamtools and augustus installs with conda doesn't catch it.
As for trinity and pasa, I'm not sure what's going on. Bioconda has been in the process of rebuilding all their recipes for the last 3 weeks to use the newer conda build system, and things are breaking a bit - for example there is a pasa recipe (that appears to support OSX) but it doesn't show up on the bioconda search page. Might be worth focusing on other development for a few weeks to let the dust settle on this one, and then swing back around to it.
I'm happy to help out, as a widespread euk annotation pipeline is definitely needed. Thanks!
PS - As I'm literally brand new to funannotate
, I'm currently stepping through the tutorial and will use the bioconda bamtools and augustus installs; most likely I'll find the error on my own. :) But just in case I don't, send a link if you have it. Thanks,
Hi @kastman, thanks for the help. Yes, I could only ever get the proteinprofile to work on OSX with v3.2.1, more recent version seem to compile, but fail at runtime. So this means that it will fail during BUSCO runs (as that is what uses --proteinprofile). The other problem that you highlighted was getting Bamtools properly linked to the compilation of filterbam
and bamtools
which are both used by BRAKER
for training Augustus. Seems like the bioconda version is correctly compiled/linked to bamtools as the BRAKER check passes. The check that funannotate
runs right now is here:
def checkAugustusFunc(base):
'''
function to try to test Augustus installation is working, note segmentation fault still results in a pass
'''
brakerpass = 0
buscopass = 0
version = subprocess.Popen(['augustus', '--version'], stderr=subprocess.STDOUT, stdout=subprocess.PIPE).communicate()[0].rstrip()
version = version.split(' is ')[0]
bam2hints = which(os.path.join(base, 'bin', 'bam2hints'))
filterBam = which(os.path.join(base, 'bin', 'filterBam'))
if bam2hints and filterBam:
brakerpass = 1
model = os.path.join(parentdir, 'lib', 'EOG092C0B3U.prfl')
if not os.path.isfile(model):
log.error("Testing Augustus Error: installation seems wrong, can't find prfl model")
sys.exit(1)
profile = '--proteinprofile='+model
proteinprofile = subprocess.Popen(['augustus', '--species=anidulans', profile, os.path.join(parentdir, 'lib', 'busco_test.fa')], stderr=subprocess.STDOUT, stdout=subprocess.PIPE).communicate()[0].rstrip()
proteinprofile.strip()
if proteinprofile == '':
buscopass = 0
elif not 'augustus: ERROR' in proteinprofile:
buscopass = 1
return (version, brakerpass, buscopass)
So here is what happens with bioconda augustus v3.3
$ augustus --species=anidulans --proteinprofile=../lib/EOG092C0B3U.prfl ../lib/busco_test.fa
# This output was generated with AUGUSTUS (version 3.3).
# AUGUSTUS is a gene prediction tool written by M. Stanke (mario.stanke@uni-greifswald.de),
# O. Keller, S. König, L. Gerischer and L. Romoth.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# No extrinsic information on sequences given.
# Initialising the parameters using config directory /Users/jon/miniconda2/config/ ...
augustus: ERROR
PP::Profile: Error parsing pattern file"../lib/EOG092C0B3U.prfl", line 8.
This is what should happen if it is compiled correctly:
$ augustus --species=anidulans --proteinprofile=../lib/EOG092C0B3U.prfl ../lib/busco_test.fa
# This output was generated with AUGUSTUS (version 3.2.1).
# AUGUSTUS is a gene prediction tool written by M. Stanke (mario.stanke@uni-greifswald.de),
# O. Keller, S. König, L. Gerischer and L. Romoth.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# No extrinsic information on sequences given.
# Initialising the parameters using config directory /Users/jon/software/augustus/config/ ...
Warning: Block unknown_E is not significant enough, removed from profile.
Warning: Block unknown_F is not significant enough, removed from profile.
Warning: Block unknown_H is not significant enough, removed from profile.
Warning: Block unknown_AC is not significant enough, removed from profile.
# Using protein profile unknown
# --[0..117]--> unknown_A (9) <--[2..25]--> unknown_B (27) <--[1..16]--> unknown_C (8) <--[0..1]--> unknown_D (15) <--[18..100]--> unknown_G (19) <--[8..25]--> unknown_I (32) <--[0..1]--> unknown_J (33) <--[1..16]--> unknown_K (38) <--[1..3]--> unknown_L (14) <--[0..5]--> unknown_M (59) <--[0..19]--> unknown_N (23) <--[0..145]--> unknown_O (23) <--[3..18]--> unknown_P (27) <--[1..44]--> unknown_Q (12) <--[10..82]--> unknown_R (13) <--[10..106]--> unknown_S (18) <--[1..11]--> unknown_T (32) <--[2..5]--> unknown_U (12) <--[0..1]--> unknown_V (32) <--[7..18]--> unknown_W (13) <--[3..8]--> unknown_X (87) <--[0..1]--> unknown_Y (12) <--[2..33]--> unknown_Z (40) <--[0..11]--> unknown_AA (16) <--[3..30]--> unknown_AB (19) <--[8..47]--> unknown_AD (23) <--[0..1]--> unknown_AE (13) <--[0..38]--
# anidulans version. Using default transition matrix.
# Looks like ../lib/busco_test.fa is in fasta format.
# We have hints for 0 sequences and for 0 of the sequences in the input set.
#
# ----- prediction on sequence number 1 (length = 3801, name = example) -----
#
# Constraints/Hints:
# (none)
# Predicted genes for sequence number 1 on both strands
# start gene g1
example AUGUSTUS gene 788 3077 0.81 + . g1
example AUGUSTUS transcript 788 3077 0.81 + . g1.t1
example AUGUSTUS start_codon 788 790 . + 0 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS CDS 788 996 1 + 0 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS CDS 1049 3077 0.81 + 1 transcript_id "g1.t1"; gene_id "g1";
example AUGUSTUS stop_codon 3075 3077 . + 0 transcript_id "g1.t1"; gene_id "g1";
# protein sequence = [MDISDLIEPPQKRLKTEDISSADEVVLPAGGITPQTDNEIDEQLSKEIEVGITEFVSADNEGFAGILKKRYTDFLVNE
# ILPSGKVLHLTNTTAPNTNDEATPVQADKKPAEDKPKEPETPAEKLPAPVEFQLAEEDEALLDTLFGTQNTKKIVALHKKALANPKTKPSDLGRLNTV
# VVNDRDQRIKMHQAIRRIFNSQIESSTDSEGMMVISVAANRNKKNPQGGGGGRERPRVNWDELGGQYLHFTIYKENKDTMEVISFIARQLKMNPKSFQ
# FAGTKDRRGVTVQRACAYRLQADRLAKLNRTLRNAVVGDFEYQPHGLELGDLYGNEFVVTLRECEVPGINIQDPASAVAKTKELVNTSLKNLYQRGYF
# NYYGLQRFGSFATRTDTVGVKILQDDFKGACDAILDYSPHILAAAQAELGQGEGEGATPTNISSEDKARALAIHIFRTTDRVTDALEKMPRKFSAESN
# IIRHLGRSKNDYLGALQTIPRNLRLMYVHAYQSLVWNLAVGERWRLYGDRVVEGDLVLIHEHRDKDGNSSYTTPAPGAGASGETTTIDADGEIIIVPQ
# EHDSAFAVEDTFTRARALTAAEANSGLYSIFDIVLPLPGFDVLYPPNKMTDFYKEFMGSSRGGGLDPFNMRRKWKDASLSGSYRKVLSRMGRDYSVDV
# VLYSRDEEQFVRTDLENLTLKTRDGGDVDLEKKEGKSEGDKLAVVLKFQLGSSQYATMALRELMRGKVKAYKPDFGGGR]
# end gene g1
###
# command line:
# augustus --species=anidulans --proteinprofile=../lib/EOG092C0B3U.prfl ../lib/busco_test.fa
Those test files are located in funannotate distribution here: https://github.com/nextgenusfs/funannotate/tree/master/lib
I verified your error with bioconda augustus 3.2.3
and 3.3
, and read through what you've tried already in #3. Do you know what about the newer gcc does the fix? If we knew what was changed, we could possibly add a patch option to the older gcc. Compiling on different versions of gcc is one of the few limitations of conda-forge/bioconda, which is quite strict (to make sure everything is compatible), so just bumping the gcc version isn't easy / possible, though I think the recent rebuild/overhaul is doing just that. I somehowd doubt that adding 3.2.1 to bioconda would help if it's also still compiled with the older gcc.
I know you haven't had much luck getting feedback, but maybe @mariostanke would be able to weigh in (not sure if he's getting github notifications though)?
Sorry to overtake this ticket - maybe we should move this conversation back to #3?
Yes would certainly be best to be fixed upstream. Its possible that adding 3.2.1
might actually help, as the --proteinprofile
part of Augustus seems to compile properly on OSX using GCC (I've tried gcc-5 through gcc-7 I think -- I don't remember if gcc-4.8 had errors or not). The changes that I made in https://github.com/nextgenusfs/augustus were just in the Makefiles as it is largely fixing the bamtools compilation problems with filterbam
and bam2hints
-- quite likely there is/was a different/better way to fix those errors. But if there were a functional version on bioconda, seems like it would be simple enough to specify that specific version if the OS was macOS for tools that require Augustus?
Is there a way to test a local bioconda build with 3.2.1
using the existing patches/recipes for 3.2.3
and 3.3
on OSX? I did also see there is a new release http://bioinf.uni-greifswald.de/augustus/binaries/, have not tested it yet.
Augustus was on the bioconda blacklist because the author moved the tarball into an "old" directory and broke the link -- I'm fixing that and will test 3.2.1 as well as update for the recent 3.3.1 release.
You can test a local build by forking bioconda/bioconda-recipes, updating the version and sha256 hash in recipes/augustus
, and running circleci build
. I'm going to give that a shot and let you know. I suspect it's a compiler problem (still running 4.8) so this likely won't work, but worth a shot?
So after you build with circleci build
can you then test those packages? I've got one of the other software packages, AMPtk
onto bioconda, but still very much a noob when it comes to the details of the build and the testing. But either way can update this thread with results of bioconda builds for 3.2.1
and 3.3.1
would be helpful.
If you want to test locally, I usually build using conda build
directly, and then conda install --use-local
, which uses the new package you just built (using circleci
actually builds in an image which is then tossed).
I'm sure the new compiler changes in conda-forge/bioconda will be useful, but there are still some kinds: now it seems there's a problem finding some of the perl modules (yaml). I'll keep you posted, but I'm still working on getting it off the blacklist.
I also noticed the specialized 3.2.1 PR from @camillescott - pulling from Jon's fixed tarball isn't the way bioconda is intended to be used, but may not be the worst idea? Ideally I'd pull in your specific patches and apply them to the upstream tarball.
Regardless, we have to clear it off the blacklist first. I'll let you know how that goes or you can follow along with the PR.
Thanks!
(also adding @lizlandis to this conversation as well so she can see the progress).
How is this moving along?
I think it will be difficult to have a bioconda recipe for funannotate and all its dependencies: when I tried to install as many dependencies as possible from bioconda, there were some packages which conflicted with each other. For example, Augustus installed, but then did not run due to wrong Boost version. Depending on the combination of packages, a different Numpy would be installed, and then loading Numpy segfaulted. And some other similar small problems.
Maybe a better solution would be to create a funannotate Anaconda channel, where developers / contributors would have more control over versions installed, and over applied patches.
I just started learning how to build conda recipes, and I don't have access to Macs, only Linux. That said, I could help creating / testing a conda recipe.
Difficult = yes. However, that is the whole reason to get a conda recipe together - to avoid the dependency nightmare. I don' t have much experience with conda and no experience with setting up "channels". Shouldn't we be okay if we specify package version numbers?
I know when bioconda updated recently there are a lot of packages that have problems and need to be rebuilt.
We could forge ahead with a linux only recipe -- Augustus on Mac is still not solved (still issues with compilation I think). I wrote an EvidenceModeler recipe here https://github.com/bioconda/bioconda-recipes/pull/10389 but hasn't been merged yet.
Sorry to have dropped this - I didn't make any progress with augustus myself, but it was taken off the blacklist and fixed in August in this PR. Looking now to see if a backported 3.2.1 will fix the compilation.
Certainly a linux-only recipe is better than nothing and the mac problems shouldn't hold it up, though that's where most of my time is spent and where I'm most motivated to keep up the good fight. :)
Hi @kastman, I was just running the Augustus v3.3 on Linux and noted that the bioconda install "breaks" funannotate
. In funannotate
, I use the $AUGUSTUS_CONFIG_PATH
to get the augustus "base directory" which then I use to get the location of the scripts
folder, so it calls the necessary accessory scripts by full path -- thus alleviating the user from also putting the scripts folder in their PATH. I noticed that the bioconda install copies these over to /bin
so they are in $PATH. But then these scripts are not found by funannotate
the pipeline crashes. The workaround I used post bioconda augustus
install is then to symlink a /scripts/
to /bin/
directory -- so that the augustus
directory tree is intact. I totally understand why scripts would be copied over to bin
, however, that isn't how most manual augustus
install "look". For other packages that also use a lot of "accessory" scripts, i.e. pasa
and trinity
the bioconda recipes put the entire install folder into /conda/opt
which then keeps the directory tree intact -- that would be preferable to me. As if I change the way funannotate looks for these scripts -- it will likely break those pipelines where augustus
is manually installed.
Hi @nextgenusfs -- yep, copying the install to /conda/opt
sounds like a reasonable step, and it sounds like there's precedent; I'll take a deeper look when I get a second. Is it possible to adjust the $AUGUSTUS_CONFIG_PATH
to point to the tree correctly too?
I tried to rebuild 3.2.1 now that augustus is off the blacklist over the weekend, but the patch didn't apply properly and I ran out of time. However, I noticed that the big conda update / rebuild is using GCC7, which might fix the compilation problem that 3.2.3 had if it gets rebuilt.
I'll let you know as I figure out more, but that certainly sounds reasonable.
Yes I think can set the ENV variables like are done in pasa/trinity, i.e. in the build.sh
via https://github.com/bioconda/bioconda-recipes/blob/master/recipes/pasa/build.sh
readonly PASAHOME=${PREFIX}/opt/${PKG_NAME}-${PKG_VERSION}
mkdir -p ${PASAHOME}
cp -Rp bin Launch_PASA_pipeline.pl misc_utilities pasa_conf PasaWeb PasaWeb.conf PerlLib PyLib run_PasaWeb.pl SAMPLE_HOOKS schema scripts ${PASAHOME}
mkdir -p ${PREFIX}/etc/conda/activate.d/
echo "export PASAHOME=${PASAHOME}" > ${PREFIX}/etc/conda/activate.d/${PKG_NAME}-${PKG_VERSION}.sh
mkdir -p ${PREFIX}/etc/conda/deactivate.d/
echo "unset PASAHOME" > ${PREFIX}/etc/conda/deactivate.d/${PKG_NAME}-${PKG_VERSION}.sh
So this basically copies over the PASA directory tree to /opt/pasa-2.3.3. So for augustus
would be similar I think, do the same install as before and copy over the necessary files to /opt
then I think just need to link augustus
to /bin
and then entire folder structure would be same as install with executable symlinked into conda $PATH.
Here's a quick stab at it (untested):
#!/bin/bash
set -x -e
export INCLUDE_PATH="${PREFIX}/include"
export LIBRARY_PATH="${PREFIX}/lib"
export LD_LIBRARY_PATH="${PREFIX}/lib"
export BOOST_INCLUDE_DIR=${PREFIX}/include
export BOOST_LIBRARY_DIR=${PREFIX}/lib
#export CXXFLAGS=" -std=c++11 -stdlib=libstdc++ -stdlib=libc++ -DUSE_BOOST -I${BOOST_INCLUDE_DIR} -L${BOOST_LIBRARY_DIR}"
export CXXFLAGS=" -std=c++11 -DUSE_BOOST -I${BOOST_INCLUDE_DIR} -L${BOOST_LIBRARY_DIR}"
export LDFLAGS="-L${BOOST_LIBRARY_DIR}"
#setup directories
AUG_HOME=$PREFIX/opt/augustus-$PKG_VERSION
mkdir -p $PREFIX/bin
mkdir -p $AUG_HOME
## Make the software
sed -i.bak -e 's/^CC *=/CXX=/' -e 's/\$(CC)/$(CXX)/g' auxprogs/homGeneMapping/src/Makefile
sed -i.bak -e 's/^CC *=/CXX=/' -e 's/\$(CC)/$(CXX)/g' auxprogs/joingenes/Makefile
# TODO: don't set CC/CXX here when switching to newer compilers
CC=gcc
CXX=g++
if [ "$(uname)" == Darwin ] ; then
# SQLITE disabled due to compile issue, see: https://svn.boost.org/trac10/ticket/13501
make CC="${CC}" CXX="${CXX}" COMPGENPRED=true
else
make CC="${CC}" CXX="${CXX}" COMPGENPRED=true SQLITE=true
fi
## Build Perl
mkdir perl-build
find scripts -name "*.pl" | xargs -I {} mv {} perl-build
cd perl-build
cp ${RECIPE_DIR}/Build.PL ./
perl ./Build.PL
perl ./Build manifest
perl ./Build install --installdirs site
cd ..
## End build perl
cp -Rp scripts config ${AUG_HOME}
mv bin/* $PREFIX/bin/
#Add some options to activate
mkdir -p $PREFIX/etc/conda/activate.d/
echo "export AUGUSTUS_CONFIG_PATH=${AUG_HOME}/config/" > $PREFIX/etc/conda/activate.d/augustus-confdir.sh
chmod a+x $PREFIX/etc/conda/activate.d/augustus-confdir.sh
mkdir -p $PREFIX/etc/conda/deactivate.d/
echo "unset AUGUSTUS_CONFIG_PATH" > $PREFIX/etc/conda/deactivate.d/augustus-confdir.sh
chmod a+x $PREFIX/etc/conda/deactivate.d/augustus-confdir.sh
chmod u+rwx $PREFIX/bin/*
I would like to make a bioconda package for funannotate -- as it is still difficult to install. Would be great if a conda expert would be willing to help out. So far there are some dependencies still missing would need to get done first.
1) Evidence Modeler is not in bioconda 2) Trinity and PASA are currently Linux only (not available for macOS) -- seems to be a compiler incompatibility? 3) Augustus via bioconda is not fully functional on macOS -- I use a slightly modified version of augustus v3.2.1 4) funannotate code is currently python2 only (needs to be migrated to py2/3 compatible) -- note this is perhaps not required for a recipe but would help with future compatibility.
Any help/guidance would be appreciated!