qiime2 / q2-vsearch

vsearch plugin for QIIME 2
BSD 3-Clause "New" or "Revised" License
6 stars 20 forks source link

Please support vsearch 2.13 or newer (Debian Packaging) #75

Closed shayandoust closed 1 year ago

shayandoust commented 4 years ago

Hello,

I am currently packaging q2-vsearch for the Debian-Med packaging team1. I have realised that q2-vsearch2 is very strict with the version of vsearch that is required (2.7.0). This became apparent as the tests failed with the following recurring messages (exemplar):

self = <q2_vsearch.tests.test_join_pairs.MergePairsTests testMethod=test_join_pairs_alt_qminout>

    def test_join_pairs_alt_qminout(self):
        with redirected_stdio(stderr=os.devnull):
>           cmd, obs = _join_pairs_w_command_output(
                self.input_seqs, qminout=-1)

q2_vsearch/tests/test_join_pairs.py:276: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
q2_vsearch/_join_pairs.py:146: in _join_pairs_w_command_output
    run_command(cmd)
q2_vsearch/_cluster_features.py:33: in run_command
    subprocess.run(cmd, check=True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['vsearch', '--fastq_mergepairs', '/<<PKGBUILDDIR>>/.pybuild/cpython3_3.8/build/q2_v....dev0/.pybuild/cpython3_3.8/build/q2_vsearch/tests/data/demux-1/BAQ2687.1_3_L001_R2_001.fastq.gz', '--fastqout', ...],)
kwargs = {}, process = <subprocess.Popen object at 0x7ffb3af2a9a0>
stdout = None, stderr = None, retcode = 1

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['vsearch', '--fastq_mergepairs', '/<<PKGBUILDDIR>>/.pybuild/cpython3_3.8/build/q2_vsearch/tests/data/demux-1/BAQ2687.1_3_L001_R1_001.fastq.gz', '--reverse', '/<<PKGBUILDDIR>>/.pybuild/cpython3_3.8/build/q2_vsearch/tests/data/demux-1/BAQ2687.1_3_L001_R2_001.fastq.gz', '--fastqout', '/tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-uv9jfa6y/BAQ2687.1_0_L001_R1_001.fastq', '--fastq_ascii', '33', '--fastq_minlen', '1', '--fastq_minovlen', '10', '--fastq_maxdiffs', '10', '--fastq_qmin', '0', '--fastq_qminout', '-1', '--fastq_qmax', '41', '--fastq_qmaxout', '41', '--minseqlength', '1', '--fasta_width', '0', '--threads', '1']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:512: CalledProcessError
----------------------------- Captured stdout call -----------------------------
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --fastq_mergepairs /<<PKGBUILDDIR>>/.pybuild/cpython3_3.8/build/q2_vsearch/tests/data/demux-1/BAQ2687.1_3_L001_R1_001.fastq.gz --reverse /<<PKGBUILDDIR>>/.pybuild/cpython3_3.8/build/q2_vsearch/tests/data/demux-1/BAQ2687.1_3_L001_R2_001.fastq.gz --fastqout /tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-uv9jfa6y/BAQ2687.1_0_L001_R1_001.fastq --fastq_ascii 33 --fastq_minlen 1 --fastq_minovlen 10 --fastq_maxdiffs 10 --fastq_qmin 0 --fastq_qminout -1 --fastq_qmax 41 --fastq_qmaxout 41 --minseqlength 1 --fasta_width 0 --threads 1

----------------------------- Captured stderr call -----------------------------
Fatal error: Invalid options to command fastq_mergepairs
Invalid option(s): --fastq_qminout --minseqlength
The valid options for the fastq_mergepairs command are: --bzip2_decompress --eeout --eetabbedout --fasta_width --fastaout --fastaout_notmerged_fwd --fastaout_notmerged_rev --fastq_allowmergestagger --fastq_ascii --fastq_eeout --fastq_maxdiffpct --fastq_maxdiffs --fastq_maxee --fastq_maxlen --fastq_maxmergelen --fastq_maxns --fastq_minlen --fastq_minmergelen --fastq_minovlen --fastq_nostagger --fastq_qmax --fastq_qmaxout --fastq_qmin --fastq_truncqual --fastqout --fastqout_notmerged_fwd --fastqout_notmerged_rev --gzip_decompress --label_suffix --log --no_progress --quiet --relabel --relabel_keep --relabel_md5 --relabel_self --relabel_sha1 --reverse --sizein --sizeout --threads --xee --xsize

Most notably, Invalid option(s): --fastq_qminout --minseqlength. It has become apparent that these options are not available in the latest version of vsearch (they were removed).

It would be much appreciated if this could be fixed.

shayandoust commented 4 years ago

Actually, it seems like these commands could exist in 2.13? So I am not sure what is going on here...

thermokarst commented 4 years ago

Hi @shayandoust!

A newer version of vsearch is on our radar, however there are a few blockers here:

So with that in mind, I just want to point out a concern I have regarding repackaging QIIME 2 - we put in a significant amount of energy building, testing, and verifying conda packages (https://busywork.qiime2.org). We are unable to provide support for packages that are built outside of our ecosystem (we have no way to verify that they work correctly). As well, the notion of the "core distribution" of QIIME 2 is quickly disappearing, as we work on building out https://library.qiime2.org, a distribution center for QIIME 2 -related conda packages. I am unsure how your approach to repackaging will scale with that shift.

Is there any chance that the Debian Med packaging team can distribute this while maintaining the conda packages? You might've noticed we don't distribute any python packages - that is because our tools aren't explicitly python packages - conda lets us work with many other runtimes, etc. We have long-term plans of developing out QIIME 2 runtimes in R, as well - hopefully that helps illustrate our view of conda.

Keep us posted!


PS - it looks like you're running python 3.8 in the log above - again, we haven't tested or verified that QIIME 2 works with Python 3.8 - this information is all embedded in the conda recipes and conda packages used to distribute QIIME 2.

colinbrislawn commented 2 years ago

@shayandoust, vsearch is now unpinned and Qiime2 2022.8 is shipping with up-to-date vsearch v2.21.1 directly from Bioconda!

Let us know if this fixes it! (It does not address questions about how best to build, test, and ship Qiime2, but that's over my pay grade.)