qiime2 / q2-demux

BSD 3-Clause "New" or "Revised" License
0 stars 20 forks source link

demux summarize --p-n conflicts with parsl argument #164

Closed colinbrislawn closed 2 months ago

colinbrislawn commented 2 months ago

Improvement Description

Current Behavior qiime demux summarize --p-n 5000 ...

In this plugin, --p-n means

The number of sequences that should be selected at random for quality score plots.

which is the same flag that parsl uses.

Proposed Behavior Rename this flag and update docs

Questions

  1. Could demux summarize be run in parallel, needing both a sampling depth and requested threads?

If this is already tracked, feel free to move or close!

colinbrislawn commented 2 months ago

I've found --p-n in a number of other plugins including new ones after Parsl support was added...

I feel like I'm missing something about how this all fits together... 🤔

gregcaporaso commented 2 months ago

Hey @colinbrislawn, There is no parsl-related argument associated with a n parameter. Here's an example of help test for a parallel Pipeline from q2-dwq2:

$ qiime dwq2 search-and-summarize --help
Usage: qiime dwq2 search-and-summarize [OPTIONS]

  Perform local alignment search of a query sequence against reference
  sequences and report the results in a table.

Inputs:
  --i-query-seqs ARTIFACT SingleDNASequence | FeatureData[Sequence]
                          Sequence(s) to query against the reference
                          sequences.                                [required]
  --i-reference-seqs ARTIFACT FeatureData[Sequence]
                          The reference sequences.                  [required]
Parameters:
  --m-reference-metadata-file METADATA...
    (multiple arguments   Reference metadata to be integrated in output.
     will be merged)                                                [optional]
  --p-n INTEGER           Maximum number of top-scoring hits to return. Pass
    Range(0, None)        zero to retain all hits.                [default: 5]
  --p-split-size INTEGER  The number of sequences to include in each split.
    Range(1, None)        (The last split may have fewer than split-size
                          sequences).                             [default: 5]
  --p-title TEXT          Title to use inside visualization.
                                   [default: 'Local Alignment Search Results']
  --p-gap-open-penalty NUMBER Range(0, None, inclusive_start=False)
                          The penalty incurred for opening a new gap. By
                          convention this is a positive number.   [default: 5]
  --p-gap-extend-penalty NUMBER Range(0, None, inclusive_start=False)
                          The penalty incurred for extending an existing gap.
                          By convention this is a positive number.
                                                                  [default: 2]
  --p-match-score NUMBER Range(0, None, inclusive_start=False)
                          The score for matching characters at an alignment
                          position. By convention, this is a positive number.
                                                                  [default: 2]
  --p-mismatch-score NUMBER Range(None, 0, inclusive_end=True)
                          The score for mismatching characters at an
                          alignment position. By convention, this is a
                          negative number.                       [default: -3]
Outputs:
  --o-hits ARTIFACT LocalAlignmentSearchResults
                          Table of search results, sorted so that the best
                          alignments are first.                     [required]
  --o-hits-table VISUALIZATION
                          Visual report of the search results.      [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --recycle-pool TEXT     Use a cache pool for pipeline resumption. QIIME 2
                          will cache your results in this pool for reuse by
                          future invocations. These pool are retained until
                          deleted by the user. If not provided, QIIME 2 will
                          create a pool which is automatically reused by
                          invocations of the same action and removed if the
                          action is successful. Note: these pools are local to
                          the cache you are using.
  --no-recycle            Do not recycle results from a previous failed
                          pipeline run or save the results from this run for
                          future recycling.
  --parallel              Execute your action in parallel. This flag will use
                          your default parallel config.
  --parallel-config FILE  Execute your action in parallel using a config at
                          the indicated path.
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this action. If not provided, the default
                          cache under $TMP/qiime2/<uname> will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --help                  Show this message and exit.

I suspect this was a misunderstanding so I'm going to close this issue, but follow-up if I'm misunderstanding.

gregcaporaso commented 2 months ago

@colinbrislawn, I think I get your confusion now.

QIIME 2's formal parallel computing support uses Parsl, and is only accessible through qiime2.Pipeline actions. This is discussed a little bit here. Action parameters can't conflict with parallel-related parameters because they're handled differently - i.e., if an Action defined a parallel parameter, q2cli would make it accessible through --p-parallel, which wouldn't conflict with the Pipeline's --parallel flag.

Some other actions (e.g., qiime dada2 denoise-*) are wrapping multi-threaded applications and may define a parameter (like --p-n) that gives the user control over that. It wouldn't be possible to have conflicting parameters in this case.

qiime demux summarize doesn't have parallel support right now (thought it'd be great to add!). To add parallel support to it, we'd need to convert it to a qiime2.Pipeline.

colinbrislawn commented 2 months ago

Thank you Greg! Yes, I was confused by how Actions and Pipelines interact with the new Parsl backend.