smirarab / sepp

Ensemble of HMM methods (SEPP, TIPP, UPP)
GNU General Public License v3.0
87 stars 38 forks source link

Is there a reason SEPP would need HMMER 3.1b2 specifically ? #76

Open ppericard opened 4 years ago

ppericard commented 4 years ago

Hi, SEPP is now a dependency for installing the latest version of QIIME2 via conda (v2019.10, https://data.qiime2.org/distro/core/qiime2-2019.10-py36-linux-conda.yml).

The bioconda recipe for SEPP 4.3.10 (https://github.com/bioconda/bioconda-recipes/blob/master/recipes/sepp/meta.yaml) calls for a very specific dependency of HMMER (v3.1b2). This, in turn, makes incompatible some QIIME2 external plugins like ITSxpress which bioconda recipe calls for hmmer>=3.1. This makes ITSxpress incompatible with the latest version of QIIME2.

I already opened an issue on the ITSxpress repository to ask if they could lower a bit their requirements for HMMER (https://github.com/USDA-ARS-GBRU/itsxpress/issues/16). But another way to deal with the problem would be to relax the requirements in the SEPP recipe to something like hmmer>=3.1b2. Thus my question. Is there a reason why SEPP would need such a specific version of HMMER or is it worth considering relaxing those constraints in the bioconda recipe?

Maybe @sjanssen2 could shed some light on this?

Thanks in advance, Pierre

sjanssen2 commented 4 years ago

Is it me, or is something broken in the latest hmmer bioconda package. When building with circleci.com I get errors like

07:55:52 BIOCONDA INFO (OUT) SafetyError: The package for hmmer located at /opt/conda/pkgs/hmmer-3.2.1-he1b5a44_2
07:55:52 BIOCONDA INFO (OUT) appears to be corrupted. The path 'bin/hmmalign'
07:55:52 BIOCONDA INFO (OUT) has an incorrect size.
07:55:52 BIOCONDA INFO (OUT)   reported size: 454688 bytes
07:55:52 BIOCONDA INFO (OUT)   actual size: 1327156 bytes

I will further investigate, but maybe the hmmer dependency shouldn't be too relaxed.

ppericard commented 4 years ago

I've been using the latest version of HMMER on bioconda without any pb, but I'm definitely not using all features. Still, in your travis job from the pull request #77, it seems you're still using the same version of hmmer (3.1b2) but the build fails (https://travis-ci.org/smirarab/sepp/jobs/623557489):

$ hmmsearch -h

# hmmsearch :: search profile(s) against a sequence database

# HMMER 3.1b2 (February 2015); http://hmmer.org/

# Copyright (C) 2015 Howard Hughes Medical Institute.
smirarab commented 4 years ago

HMMER sometimes changes how the output is formatted and thus, we need to stay on this version I believe. If we were to move, we need to do careful tests to make sure SEPP results are not impacted.

Also, SEPP bundles a version of HMMER inside itself. Is it not possible to just use that one instead of using the global HMMER installed by QIIME2?

On Wed, Dec 11, 2019 at 1:39 AM Pierre Pericard notifications@github.com wrote:

I've been using the latest version of HMMER on bioconda without any pb, but I'm definitely not using all features. Still, in your travis job from the pull request #77 https://github.com/smirarab/sepp/pull/77, it seems you're still using the same version of hmmer (3.1b2) but the build fails ( https://travis-ci.org/smirarab/sepp/jobs/623557489):

$ hmmsearch -h

hmmsearch :: search profile(s) against a sequence database

HMMER 3.1b2 (February 2015); http://hmmer.org/

Copyright (C) 2015 Howard Hughes Medical Institute.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/smirarab/sepp/issues/76?email_source=notifications&email_token=AAGJXOGOLXKWKFAMTVXEEGTQYCYOLA5CNFSM4JZAPMI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGSPQFY#issuecomment-564459543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOFJZRY4LAR5EPAJXYLQYCYOLANCNFSM4JZAPMIQ .

-- Siavash Mirarab

ppericard commented 4 years ago

HMMER is not a requirement of the QIIME2 conda recipe, but it is a requirement of the SEPP bioconda recipe. Therefore, by making HMMER 3.1b2 a requirement of the SEPP recipe it makes it a requirement for all the other tools and plugins of QIIME2 (that need to be installed in the same conda environment).

I understand that you might need to set a specific version of a dependency for your tool to work, but the general policy of bioconda recipe is to try to be as relaxed as possible with the dependencies in order to prevent this type of incompatibility. And in this particular case, I'm not sure the output formating of HMMER would change much between the v3.1b2 and the v3.1.

However, if SEPP really bundles a version of HMMER inside itself, then the SEPP bioconda recipe shouldn't even need to specify HMMER as a dependency which would solve the problem.

Thank you anyway for trying to solve the issue.

sjanssen2 commented 4 years ago

In order to have a lean package for bioconda, I decided to ignore the bundled binaries of HMMer and preferred to add them as dependencies.

@ppericard: As a quick workaround, you could create two qiime2-2019.10 environments. In one, you remove the q2-fragment-insertion and thus SEPP package and thus get rid of the hmmer dependency. Then, you should be able to install your ITSxpress plugin. The other (original) environment should give you all functionality of SEPP. This procedure might give us enough time to thoroughly test if we can relax the HMMer dependency for future SEPP releases.

smirarab commented 4 years ago

Stefan,

How about changing sepp plugin to use bundled binaries? Would that be a bad solution?

Thanks Siavash

On Fri, Dec 13, 2019 at 12:24 AM Stefan Janssen notifications@github.com wrote:

In order to have a lean package for bioconda, I decided to ignore the bundled binaries of HMMer and preferred to add them as dependencies.

@ppericard https://github.com/ppericard: As a quick workaround, you could create two qiime2-2019.10 environments. In one, you remove the q2-fragment-insertion and thus SEPP package and thus get rid of the hmmer dependency. Then, you should be able to install your ITSxpress plugin. The other (original) environment should give you all functionality of SEPP. This procedure might give us enough time to thoroughly test if we can relax the HMMer dependency for future SEPP releases.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/smirarab/sepp/issues/76?email_source=notifications&email_token=AAGJXOG63D2MEBG4LIXA5K3QYNBDFA5CNFSM4JZAPMI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGZINPA#issuecomment-565348028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOA4EYUQ56G3UM3UR23QYNBDFANCNFSM4JZAPMIQ .

-- Siavash Mirarab

smirarab commented 3 years ago

Revisiting this issue. Stefan, should we simply test HMMER v3.1? It seems that Pierre is simply asking to change from v3.1b2 to v3.1. That may be completely harmless.

sjanssen2 commented 3 years ago

I tried that some time ago (cf. #77), but it didn't work out of the box and I could not find the time to debug.

smirarab commented 3 years ago

Stefan and Pierre,

I can try to see if hmmer version could be changed. But I am a bit confused. On hmmer bioconda package, I see the following versions. The default (bundled version) is 3.1b2. I don't see any other 3.1 version that is more recent. Which one of the following should I try to make SEPP compatible with? Pierre said ' I'm not sure the output formating of HMMER would change much between the v3.1b2 and the v3.1.' But since there is no v3.1 on bioconda, I would like to understand which other version among those available on bioconda is being discussed here.

3.3.2-0, 3.3.1-0, 3.3-1, 3.3-0, 3.2.1-2, 3.2.1-1, 3.2.1-0, 3.2-3, 3.2-2, 3.2-0, 3.1b2-3, 3.1b2-2, 3.1b2-1, 3.1b2-0, 3.0-0, 2.3.2-4, 2.3.2-3, 2.3.2-2, 2.3.2-1, 2.3.2-0

On Wed, Mar 10, 2021 at 11:01 AM Stefan Janssen @.***> wrote:

I tried that some time ago (cf. #77 https://github.com/smirarab/sepp/pull/77), but it didn't work out of the box and I could not find the time to debug.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/smirarab/sepp/issues/76#issuecomment-795917913, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOG7OBEHJFZORIPXVTDTC66ZPANCNFSM4JZAPMIQ .

-- Siavash Mirarab

smirarab commented 3 years ago

One more thing. I just tested SEPP with HMMER 3.3.2 and it seemed to work perfectly fine in the one test case I ran (locally not on conda). If this is continuing to pose a problem, I'd be happy to test this on more data to make sure the change of version doesn't change results; we can then relax the requirement.

sjanssen2 commented 3 years ago

I think what we should aim for is a relaxed dependency like - hmmer >=3. Unfortunately, I don't find a single file on the hmmer page listing the version changes. Not sure if / when they made changes to the output format (not to mention changes to the output content) that might break SEPP. Thus, I fear we need to iterate through all versions with sufficient test data and check if our tests pass. I am also not 100% sure if conda understands the somehow inconsistent version number schema of hmmer: b and rc infixes :-/

Sann5 commented 5 months ago

Hello @sjanssen2 and @smirarab. I would like to push for the dependency relaxation of HMMER. Please correct me if my wrong but it seems to me that the commands that sepp borrows from HMMER are hmmbuild, hmmsearch, and hmmalign. The inputs/outputs to these are the following:

We are concerned that either of these input/output formats has changed in a newer version of hmmer and therefore would break something in sepp if we update it. Let's walk through the formats underlying each of these inputs/outputs.

So it seems to me that it would be safe to use a newer hmmer version. To make sure I will run a couple of tests on these commands, once with v3.1b2 and another with v3.4. I'll parse the outputs to make sure they are the same. If it all goes smoothly I'll run sepp with hmmer 3.4 and check if behavior is as expected. Then we can proceed with bioconda/bioconda-recipes#48294 @sjanssen2?

sjanssen2 commented 5 months ago

Many thanks @Sann5 for researching all these details. Let's cross fingers that your regression test does not find incompatibilities. Would be amazing to lift the pinning!

Sann5 commented 5 months ago

Update ⬆️

So it the output format of all 3 programs is the same. However the profiles that are generated with version 3.1b2 are different from the profiles generated by the 3.4 version in terms of the values of the estimated parameters. This might be due to changes in the sampling algorithm that hmmer uses to estimate these parameters. However I don't see how this would be a problem in sepp.

I'm still to test sep with both versions. I'll let you know how it goes.

Sann5 commented 5 months ago

So... 🥁 🥁 🥁

I made two conda environments and installed sepp via conda. Then in one I removed hmmer and installed manually the 3.4 version, compiling it from the source and placing the executables in the expected path. Then I proceed to carry out the sepp tutorial, specifically the first 3 commands (1, 2, and 3).

For both environments, all commands were completed without errors 🥳 ✅ . Then I proceeded to compare the outputs using the command line tool diff. As expected since hmmer 3.1b2 and 3.4 generate different profiles, the results of sepp are also different. However, they both appear to be correctly formatted.

If you wish I can share the code and the outputs. Is this enough evidence to relax the hmmer dependency @sjanssen2 @smirarab?

Sann5 commented 4 months ago

@sjanssen2 @smirarab?

gregcaporaso commented 4 months ago

Hi all, Any updates on this? We're in a jam in the QIIME 2 ecosystem as we have some plugin developers who would like to use the most recent version of HMMER (3.4) for plugins that are installed in the same QIIME 2 distribution as q2-fragment-insertion/SEPP. This very specific requirement is preventing that.

sjanssen2 commented 4 months ago

Yes there is thanks to @Sann5 :-) See: https://github.com/bioconda/bioconda-recipes/pull/48294 However, we are waiting for @smirarab response since approx. a month

gregcaporaso commented 4 months ago

Hi @smirarab, Do you happen to have an ETA on this and the Py 3.10 updates (#136)? I don't mean to pester you - we're just hoping to upgrade the HMMER and Python dependency versions for QIIME 2 2024.10 (scheduled for 2 October 2024), and we'd need some time for testing and updating things on our end.

smirarab commented 4 months ago

@Sann5 @sjanssen2 Based on Stefan's work, it seems fine to live with the changed results, as long as we bump the major version.

smirarab commented 3 months ago

The solutions I adopted is as follows.

These changes are done in 6719eea38957d34254cd3c49f4b63c80506555ab

@sjanssen2 let me know when you are done with the bioconda side and I can close the issue.

sjanssen2 commented 3 months ago

Hi @smirarab could you please take a look at https://github.com/smirarab/pasta/pull/70

smirarab commented 3 months ago

Done.