ropensci / unconf17

Website for 2017 rOpenSci Unconf
http://unconf17.ropensci.org
64 stars 12 forks source link

External API: Querying and accessing external software (via `system()`) #95

Open HenrikBengtsson opened 7 years ago

HenrikBengtsson commented 7 years ago

In my research field, computational genomics / bioinformatics, it becoming more and more common that you run analytical pipelines that calls various standalone external software tools. This is often done via Unix shell script, but also from R itself. When doing it from R, it is quite often that everyone does some one-off implementation that is "good-enough" for what needs to be done.

R provides Sys.which() for identify external software, and system() and system2() for calling them. If one goes through the source code of R itself, one can see that there are a few different flavors of how this is used. Some functions may for instance locate external software also via an environment variable and / or R option. But, other than that there is not real standard to how this is done.

Some quick thoughts of an API

Locating external software / executables

Information and attributes

Calling

Contracts of input & output

That's all I have had time to scribble down for now. I'm sure there are some packages out there that may target parts of the above.

cboettig commented 7 years ago

I'd definitely like to see some collected wisdom on this. Beyond direct calls to system and system2, I think I've seen clever stuff by @richfitz refer to an internal function from @gaborcsardi callr for this (https://github.com/richfitz/drat.builder/blob/master/R/utils.R#L3) (which I can't seem to find in callr.

definitely would be interested to see an implementation along the lines you sketch out above.

gaborcsardi commented 7 years ago

callr uses processx (https://github.com/r-pkgs/processx) now, which a lot of nice features (e.g. timeouts) for external processes, especially background processes.