jbenet / transformer

transformer - multiformat data conversion
transform.datadex.io
130 stars 7 forks source link

Conversions with multiple inputs and outputs #14

Open jbenet opened 10 years ago

jbenet commented 10 years ago

Currently, conversion functions only take one input / output. Will be critical to allow multiple. Impl is easy, api less so (lots to consider: how the conversions, the js transformer, and the cli should handle all this). Will use this issue to propose a few api changes and then select one.

jbenet commented 10 years ago

Btw, real use case from @bmpvieira:

what if I need to transform something from two sources into one (into dat) and vice-versa. for example, sequence data(fasta) and base qualities (qual) into the combined format (fastq). some bioinformatic softwares prefer fasta+qual others fastq so you end up using a custom python, perl or other tool to convert, usually using libraries like biopython

So cli should definitely let you do this. Something like

transform <transformer-pipeline> --inputs fileA fileB fileC --outputs fileOutD fileOutE
transform fileA fileB fileC -- <transformer-pipeline> -- fileOutD fileOutE

So generalizing:

# --flags have to come after pipeline because of ambiguity.
transform <pipeline> [--inputs <inputs>] [--outputs <outputs>]

# or add --pipeline flag for whatever order
transform [--pipeline <pipeline>] [--inputs <inputs>] [--outputs <outputs>]

# or use -- delimiters
transform [<inputs> --] <pipeline> [-- <outputs>]

Does anyone have good examples of clis that take variable number of input + output files?


The JS side of these examples are much easier:

// sync
var convert = transformer('type1', 'type2', ...); // pipeline
var outputs = convert([input1, input2, input3]);
// outputs = [output1, output2]

// async
var convert = transformer.async('type1', 'type2', ...); // pipeline
convert([input1, input2, input3], function(err, outputs) {
    // outputs = [output1, output2]
});

inputs and outputs can be anything: strings, streams, whatever :)