trevorld / r-argparse

command-line optional and positional argument parser
GNU General Public License v2.0
103 stars 11 forks source link

Support custom function 'type' in 'add_argument()' #37

Open ellendejong opened 2 years ago

ellendejong commented 2 years ago

Hi,

Is it possible to provide a custom function to type argument in add_argument? For example, something similar to:

non_empty_existing_file <- function(inputfile) {
  if(!file.exists(inputfile)) { 
    stop("Non existing input path:", inputfile)
  } else if(file.info(inputfile)$size == 0) {  # check empty
    stop("Input file is empty:", inputfile)
  }
  return(inputfile)
}
parser = ArgumentParser(prog='PROG')
parser$add_argument('input_file', type=non_empty_existing_file(), help="Input file.")
args <- parser$parse_args(commandArgs(TRUE))

Currently, this example does not work, because input_file value is not passed to non_empty_existing_file. I haven't seen any examples using custom types and it would be really useful. If it is supported already, could you add some examples to the documentation?

Thanks in advance.

Ellen

trevorld commented 2 years ago

Currently add_argument() only supports the following types: 'logical', 'integer', 'double' or 'character'. The current implementation passes data back and forth to Python so supporting/passing arbitrary functions would be a bit tricky.

For your particular example you could validate that the file exists and is non-empty after parsing your arguments:

parser = argparse::ArgumentParser(prog='PROG')
parser$add_argument('input_file', help="Input file.")
args <- parser$parse_args()

stopifnot(file.exists(args$input_file))
stopifnot(file.info(args$input_file)$size > 0)
Lain-inrae commented 2 years ago

In fact, it may be quite easy to provide function written in python:


non_empty_existing_file <- "(lambda path:exec('import os;assert os.path.getsize(path)>0'))"

parser = ArgumentParser(prog='PROG')
parser$add_argument('input_file', type=non_empty_existing_file, help="Input file.")
args <- parser$parse_args(commandArgs(TRUE))

It only requires add_argument to accept any strings in "type", as pass them as-is in python code.

EDIT: To make things more clean, a solution would be to provide "environment preparation code" to ArgumentParser. In their example, it would:

that way, the function would be usable in the "type" function (python-side). Again, if the env preparation code was provided as-is, on top of the whole python argparse code, this would provide an easy way to define functions to use in add_argument.

It would be quite usefull for logical values, as all these examples:

produces "TRUE" as a result, instead of FALSE. Then, it would be very easy to provide a python function that produces the good output for these inputs.

trevorld commented 2 years ago

It only requires add_argument to accept any strings in "type", as pass them as-is in python code.

Technically we already do this with the formatter_class argument so we could indeed do this here but the lambda functions would have to be fairly long (e.g. I think your example would also need a ; return path in there?) or we'd need to also provide an "environment preparation code" where users pass arbitrary python code to define their desired functions. Doesn't really seem though that this would be simpler/cleaner than validating arguments afterwards especially since users would have to start writing/maintaining code in an additional language instead of just R...

It would be quite useful for logical values...

For logical values you should probably be using action="store_true" and/or action="store_false" instead of trying to store a boolean (you could even have multiple arguments pointing to the same dest like --quiet and --loud). I should probably be even more aggressive about throwing a WARNING when users try to store booleans...

Lain-inrae commented 2 years ago

but the lambda functions would have to be fairly long

That's why I suggested to add an "env preparation code": instead of writting a lambda, we could then write a (some) named function (with the "def" keywork), and not be limited to evaluable code (which makes complicated code for long tasks such as type checking and so).

since users would have to start writing/maintaining code in an additional language instead of just R...

I totally agree with you. But the optional burden of maintaining multiple languages remains the user's choice. Since this library is meant to be a wrapper around python's argparse, I'd expect it to have it's flexibility. Passing functions to the "type" parameter is the intended behaviour in python's argparse.

you should probably be using action="store_true" and/or action="store_false"

Not in my case, where I need to store multiple boolean values, with "append". With a function-passing feature, I's easy to implement that behaviour:

interprete_booleans <- "(lambda x:x.lower() in ('true', 't', '1'))"

parser <- ArgumentParser(prog='PROG')
parser$add_argument('-p', type=interprete_booleans, action="append", help="a list of booleans")
args <- parser$parse_args()
print(args)
## ~ $ Rscript myscript.R -p true -p 1 -p 0 -p false -p badValueInterpretedAsFalse
## $p
## [1] TRUE TRUE FALSE FALSE FALSE
parser <- ArgumentParser(prog='PROG')
parser$add_argument('-p', type="logical", action="append", help="a list of booleans")
args <- parser$parse_args()
print(args)
## ~ $ Rscript myscript.R -p true -p 1 -p 0 -p false -p badValueInterpretedAsWeWant
## $p
## [1] TRUE TRUE TRUE TRUE TRUE
parser <- ArgumentParser(prog='PROG')
parser$add_argument('-p', action="store_true", help="a list of booleans")
args <- parser$parse_args()
print(args)
## ~ $ Rscript myscript.R -p -p -p -p -p -p
$p
[1] TRUE
parser <- ArgumentParser(prog='PROG')
parser$add_argument('-p', action="store_true", help="a list of booleans")
args <- parser$parse_args()
print(args)
## ~ $ Rscript myscript.R -p -p -\!p  ## the -!p is seen in some tools to say "not p"
## PROGRAM: error: unrecognized arguments: -!p
trevorld commented 2 years ago

Since this library is meant to be a wrapper around python's argparse

Technically this library just is "meant" to "mimic" python's argparse API. Currently it does wrap python's argparse since that was the easiest way to get a minimal-viable product but an eventual pure R implementation of the argparse API isn't necessarily out of this project's scope.

But the optional burden of maintaining multiple languages remains the user's choice.

However there are alternatives that wouldn't require this burden:

1) Save R functions to a temporary .RData file and somehow use them within python using something like the Rpy2 module 2) Cast the arguments in R after getting back the (character) values from Python, perhaps somehow build a list of dest names and associated processing functions and then call such functions on those dest values. Wouldn't necessarily work on edge cases of multiple arguments using the same dest value with different type's...