grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

Why does order of mandatory vs optional flags matter? #96

Closed olgabot closed 5 years ago

olgabot commented 5 years ago

Hello, I'm working on a python package to help with Reflow adoption at Biohub, in particular a check-batch command to reflow run the first line of a samples.csv from a config.json. The issue is that from the samples.csv, I don't know which columns correspond to mandatory or optional flags. So when I build a command to run with Python's subprocess, I can't order the flags as would be needed for the reflow file. In this command, ALL of the flags are there, but since the second flag is -ignore_abundance (an optional one), the rest don't get read and the program fails.

Is there a way to ignore the order of flags and instead read them all? Warmest, Olga

  ✘  Wed 12 Dec - 16:31  ~/code/cupcakes/2018/olgas_bioinformagician_tricks/lung_cancer/search_protein_databases   origin ☊ master 7☀ 1● 
  aguamenti check-batch
['reflow', 'run', '../../reflow/sourmash_search.rf', '-signature', 's3://olgabot-maca/lung_cancer/sourmash_v4/C14_B003528_S62.signature', '-ignore_abundance', 'True', '-database_name', 'tabula-muris-k21-protein', '-output', 's3://czbiohub-cupcakes/2018-10-09/olgabot/tabula-muris-k21-protein/C14_B003528_S62.csv', '-ksize', '21', '-sequence_to_compare', 'protein', '-database', 's3://olgabot-maca/facs/sourmash_index_all/tabula-muris-k21-protein/tabula-muris-k21-protein/']
Running 'reflow run ../../reflow/sourmash_search.rf -signature s3://olgabot-maca/lung_cancer/sourmash_v4/C14_B003528_S62.signature -ignore_abundance True -database_name tabula-muris-k21-protein -output s3://czbiohub-cupcakes/2018-10-09/olgabot/tabula-muris-k21-protein/C14_B003528_S62.csv -ksize 21 -sequence_to_compare protein -database s3://olgabot-maca/facs/sourmash_index_all/tabula-muris-k21-protein/tabula-muris-k21-protein/'
flag errors:
    missing mandatory flag -database
    missing mandatory flag -database_name
    missing mandatory flag -output
usage of ../../reflow/sourmash_search.rf:
  -database string
        S3 full path to the sourmash database folder containing the database folder e.g. s3://olgabot-maca/facs/sourmash_index_all/tabula-muris-k21-protein/tabula-muris-k21-protein/ Note: this folder contains tabula-muris-k21-protein.sbt.json and a bunch of hidden files (required)
  -database_name string
        Name of the database e.g.: tabula-muris-k21-protein (required)
  -ignore_abundance
        Whether or not to include the abundance of kmers in the comparison

  -ksize uint
        Size of kmer to use (can only use one for index)
         (default 21)
  -output string
        CSV file to write with search results e.g s3://olgabot-maca/facs/sourmash_search/A1-B000610-3_56_F-1-1_tabula-muris-k21-protein.csv (required)
  -sequence_to_compare string
        What to compare, could be either "protein" or "dna"
         (default "dna")
  -signature string
        S3 path to single signature file e.g. s3://olgabot-maca/facs/sourmash_compute_all/A1-B000610-3_56_F-1-1.sig (required)
jcharum commented 5 years ago

Hi @olgabot! I think this is happening because of -ignore_abundance True. To set a boolean flag to true, I think it needs to just be -ignore_abundance. I think, with low but nonzero confidence, that the error happens because the flag parser believes that True is a positional argument (not meaningful but a consequence of the parser Reflow uses) and that it is done parsing "options".

Let me know if that helps.

mariusae commented 5 years ago

That is correct. This is a quirk in Go's flags package (which we use to parse Reflow flags.) Boolean flags can take the form -flag, -flag=true, or -flag=false, but can not (unlike non-boolean flags) take the form -flag value.