Open ppericard opened 4 years ago
Is it me, or is something broken in the latest hmmer bioconda package. When building with circleci.com I get errors like
07:55:52 BIOCONDA INFO (OUT) SafetyError: The package for hmmer located at /opt/conda/pkgs/hmmer-3.2.1-he1b5a44_2
07:55:52 BIOCONDA INFO (OUT) appears to be corrupted. The path 'bin/hmmalign'
07:55:52 BIOCONDA INFO (OUT) has an incorrect size.
07:55:52 BIOCONDA INFO (OUT) reported size: 454688 bytes
07:55:52 BIOCONDA INFO (OUT) actual size: 1327156 bytes
I will further investigate, but maybe the hmmer dependency shouldn't be too relaxed.
I've been using the latest version of HMMER on bioconda without any pb, but I'm definitely not using all features. Still, in your travis job from the pull request #77, it seems you're still using the same version of hmmer (3.1b2) but the build fails (https://travis-ci.org/smirarab/sepp/jobs/623557489):
$ hmmsearch -h
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.1b2 (February 2015); http://hmmer.org/
# Copyright (C) 2015 Howard Hughes Medical Institute.
HMMER sometimes changes how the output is formatted and thus, we need to stay on this version I believe. If we were to move, we need to do careful tests to make sure SEPP results are not impacted.
Also, SEPP bundles a version of HMMER inside itself. Is it not possible to just use that one instead of using the global HMMER installed by QIIME2?
On Wed, Dec 11, 2019 at 1:39 AM Pierre Pericard notifications@github.com wrote:
I've been using the latest version of HMMER on bioconda without any pb, but I'm definitely not using all features. Still, in your travis job from the pull request #77 https://github.com/smirarab/sepp/pull/77, it seems you're still using the same version of hmmer (3.1b2) but the build fails ( https://travis-ci.org/smirarab/sepp/jobs/623557489):
$ hmmsearch -h
hmmsearch :: search profile(s) against a sequence database
HMMER 3.1b2 (February 2015); http://hmmer.org/
Copyright (C) 2015 Howard Hughes Medical Institute.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/smirarab/sepp/issues/76?email_source=notifications&email_token=AAGJXOGOLXKWKFAMTVXEEGTQYCYOLA5CNFSM4JZAPMI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGSPQFY#issuecomment-564459543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOFJZRY4LAR5EPAJXYLQYCYOLANCNFSM4JZAPMIQ .
-- Siavash Mirarab
HMMER is not a requirement of the QIIME2 conda recipe, but it is a requirement of the SEPP bioconda recipe. Therefore, by making HMMER 3.1b2 a requirement of the SEPP recipe it makes it a requirement for all the other tools and plugins of QIIME2 (that need to be installed in the same conda environment).
I understand that you might need to set a specific version of a dependency for your tool to work, but the general policy of bioconda recipe is to try to be as relaxed as possible with the dependencies in order to prevent this type of incompatibility. And in this particular case, I'm not sure the output formating of HMMER would change much between the v3.1b2 and the v3.1.
However, if SEPP really bundles a version of HMMER inside itself, then the SEPP bioconda recipe shouldn't even need to specify HMMER as a dependency which would solve the problem.
Thank you anyway for trying to solve the issue.
In order to have a lean package for bioconda, I decided to ignore the bundled binaries of HMMer and preferred to add them as dependencies.
@ppericard: As a quick workaround, you could create two qiime2-2019.10 environments. In one, you remove the q2-fragment-insertion and thus SEPP package and thus get rid of the hmmer dependency. Then, you should be able to install your ITSxpress plugin. The other (original) environment should give you all functionality of SEPP. This procedure might give us enough time to thoroughly test if we can relax the HMMer dependency for future SEPP releases.
Stefan,
How about changing sepp plugin to use bundled binaries? Would that be a bad solution?
Thanks Siavash
On Fri, Dec 13, 2019 at 12:24 AM Stefan Janssen notifications@github.com wrote:
In order to have a lean package for bioconda, I decided to ignore the bundled binaries of HMMer and preferred to add them as dependencies.
@ppericard https://github.com/ppericard: As a quick workaround, you could create two qiime2-2019.10 environments. In one, you remove the q2-fragment-insertion and thus SEPP package and thus get rid of the hmmer dependency. Then, you should be able to install your ITSxpress plugin. The other (original) environment should give you all functionality of SEPP. This procedure might give us enough time to thoroughly test if we can relax the HMMer dependency for future SEPP releases.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/smirarab/sepp/issues/76?email_source=notifications&email_token=AAGJXOG63D2MEBG4LIXA5K3QYNBDFA5CNFSM4JZAPMI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGZINPA#issuecomment-565348028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOA4EYUQ56G3UM3UR23QYNBDFANCNFSM4JZAPMIQ .
-- Siavash Mirarab
Revisiting this issue. Stefan, should we simply test HMMER v3.1? It seems that Pierre is simply asking to change from v3.1b2 to v3.1. That may be completely harmless.
I tried that some time ago (cf. #77), but it didn't work out of the box and I could not find the time to debug.
Stefan and Pierre,
I can try to see if hmmer version could be changed. But I am a bit confused. On hmmer bioconda package, I see the following versions. The default (bundled version) is 3.1b2. I don't see any other 3.1 version that is more recent. Which one of the following should I try to make SEPP compatible with? Pierre said ' I'm not sure the output formating of HMMER would change much between the v3.1b2 and the v3.1.' But since there is no v3.1 on bioconda, I would like to understand which other version among those available on bioconda is being discussed here.
3.3.2-0, 3.3.1-0, 3.3-1, 3.3-0, 3.2.1-2, 3.2.1-1, 3.2.1-0, 3.2-3, 3.2-2, 3.2-0, 3.1b2-3, 3.1b2-2, 3.1b2-1, 3.1b2-0, 3.0-0, 2.3.2-4, 2.3.2-3, 2.3.2-2, 2.3.2-1, 2.3.2-0
On Wed, Mar 10, 2021 at 11:01 AM Stefan Janssen @.***> wrote:
I tried that some time ago (cf. #77 https://github.com/smirarab/sepp/pull/77), but it didn't work out of the box and I could not find the time to debug.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/smirarab/sepp/issues/76#issuecomment-795917913, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOG7OBEHJFZORIPXVTDTC66ZPANCNFSM4JZAPMIQ .
-- Siavash Mirarab
One more thing. I just tested SEPP with HMMER 3.3.2 and it seemed to work perfectly fine in the one test case I ran (locally not on conda). If this is continuing to pose a problem, I'd be happy to test this on more data to make sure the change of version doesn't change results; we can then relax the requirement.
I think what we should aim for is a relaxed dependency like - hmmer >=3
. Unfortunately, I don't find a single file on the hmmer page listing the version changes. Not sure if / when they made changes to the output format (not to mention changes to the output content) that might break SEPP.
Thus, I fear we need to iterate through all versions with sufficient test data and check if our tests pass.
I am also not 100% sure if conda understands the somehow inconsistent version number schema of hmmer: b
and rc
infixes :-/
Hello @sjanssen2 and @smirarab. I would like to push for the dependency relaxation of HMMER. Please correct me if my wrong but it seems to me that the commands that sepp borrows from HMMER are hmmbuild
, hmmsearch
, and hmmalign
. The inputs/outputs to these are the following:
hmmalign [options] hmmfile seqfile -> alignment
hmmbuild [options] hmmfile msafile -> hmmfile
hmmsearch [options] hmmfile seqdb -> ranked_lists
We are concerned that either of these input/output formats has changed in a newer version of hmmer and therefore would break something in sepp if we update it. Let's walk through the formats underlying each of these inputs/outputs.
hmmfile
: The format of hmm files has changed over time but they also update the hmmer parser such that it's backward compatible, i.e. new hmmer can take old hmm files as input. Page 210 HMMER documentationranked_lists
: The format of this output is controllable with the options of the command, but its generally a tabular format separated by spaces. No mention of change in the release notes since the 3.1b2 release. alignment
: The default is stockholm alignment format but other formats can be chosen using the options (a2m, afa, psiblast, clustal, phylip). No mention of change in the release notes since the 3.1b2 release. seqfile
, seqdb
and msafile
: Input files for HMMER include unaligned sequence files and multiple sequence alignment files. HMMER’s preferred alignment file format is Stockholm format. HMMER can read several other sequence and alignment file formats. By default, it autodetects what format an input file is in. Accepted unaligned sequence file formats include fasta, uniprot, genbank, ddbj, and embl. Accepted multiple alignment file formats include stockholm, afa (i.e. aligned FASTA), clustal, clustallike (MUSCLE, etc.), a2m, phylip (interleaved), phylips (sequential), psiblast, and selex. Page 30 HMMER documentation. No mention of change in the release notes since the 3.1b2 release. So it seems to me that it would be safe to use a newer hmmer version. To make sure I will run a couple of tests on these commands, once with v3.1b2 and another with v3.4. I'll parse the outputs to make sure they are the same. If it all goes smoothly I'll run sepp with hmmer 3.4 and check if behavior is as expected. Then we can proceed with bioconda/bioconda-recipes#48294 @sjanssen2?
Many thanks @Sann5 for researching all these details. Let's cross fingers that your regression test does not find incompatibilities. Would be amazing to lift the pinning!
Update ⬆️
So it the output format of all 3 programs is the same. However the profiles that are generated with version 3.1b2 are different from the profiles generated by the 3.4 version in terms of the values of the estimated parameters. This might be due to changes in the sampling algorithm that hmmer uses to estimate these parameters. However I don't see how this would be a problem in sepp.
I'm still to test sep with both versions. I'll let you know how it goes.
So... 🥁 🥁 🥁
I made two conda environments and installed sepp via conda. Then in one I removed hmmer and installed manually the 3.4 version, compiling it from the source and placing the executables in the expected path. Then I proceed to carry out the sepp tutorial, specifically the first 3 commands (1, 2, and 3).
For both environments, all commands were completed without errors 🥳 ✅ . Then I proceeded to compare the outputs using the command line tool diff
. As expected since hmmer 3.1b2 and 3.4 generate different profiles, the results of sepp are also different. However, they both appear to be correctly formatted.
If you wish I can share the code and the outputs. Is this enough evidence to relax the hmmer dependency @sjanssen2 @smirarab?
@sjanssen2 @smirarab?
Hi all, Any updates on this? We're in a jam in the QIIME 2 ecosystem as we have some plugin developers who would like to use the most recent version of HMMER (3.4) for plugins that are installed in the same QIIME 2 distribution as q2-fragment-insertion/SEPP. This very specific requirement is preventing that.
Yes there is thanks to @Sann5 :-) See: https://github.com/bioconda/bioconda-recipes/pull/48294 However, we are waiting for @smirarab response since approx. a month
Hi @smirarab, Do you happen to have an ETA on this and the Py 3.10 updates (#136)? I don't mean to pester you - we're just hoping to upgrade the HMMER and Python dependency versions for QIIME 2 2024.10 (scheduled for 2 October 2024), and we'd need some time for testing and updating things on our end.
@Sann5 @sjanssen2 Based on Stefan's work, it seems fine to live with the changed results, as long as we bump the major version.
The solutions I adopted is as follows.
These changes are done in 6719eea38957d34254cd3c49f4b63c80506555ab
@sjanssen2 let me know when you are done with the bioconda side and I can close the issue.
Hi @smirarab could you please take a look at https://github.com/smirarab/pasta/pull/70
Done.
Hi, SEPP is now a dependency for installing the latest version of QIIME2 via conda (v2019.10, https://data.qiime2.org/distro/core/qiime2-2019.10-py36-linux-conda.yml).
The bioconda recipe for SEPP 4.3.10 (https://github.com/bioconda/bioconda-recipes/blob/master/recipes/sepp/meta.yaml) calls for a very specific dependency of HMMER (v3.1b2). This, in turn, makes incompatible some QIIME2 external plugins like ITSxpress which bioconda recipe calls for
hmmer>=3.1
. This makes ITSxpress incompatible with the latest version of QIIME2.I already opened an issue on the ITSxpress repository to ask if they could lower a bit their requirements for HMMER (https://github.com/USDA-ARS-GBRU/itsxpress/issues/16). But another way to deal with the problem would be to relax the requirements in the SEPP recipe to something like
hmmer>=3.1b2
. Thus my question. Is there a reason why SEPP would need such a specific version of HMMER or is it worth considering relaxing those constraints in the bioconda recipe?Maybe @sjanssen2 could shed some light on this?
Thanks in advance, Pierre