shawnbot / tito

Tables In, Tables Out
Creative Commons Zero v1.0 Universal
22 stars 0 forks source link

A better API proposal #11

Open shawnbot opened 8 years ago

shawnbot commented 8 years ago

JS API

Open and parse a file in a known format

tito.createReadStream(filename [, format [, options]])

This is the equivalent of:

fs.createReadStream(filename)
  .pipe(tito.createReadStream(format, options))

Open and write to a file in a known format

tito.createWriteStream(filename [, format [, options]])

i.e.

fs.createWriteStream(filename)
  .pipe(tito.createWriteStream(format, options));

Format inference

If no format is provided it should be inferred from the filename (filename.split('.').pop()); an error would be thrown if no extension is found.

File-agnostic streams

Under the hood, the read and write stream functions would use the following to create parse and format streams:

tito.createParseStream(format [, options])

to parse strings into objects (e.g. CSV to JSON), and

tito.createFormatStream(format [, options])

to go the other way (JSON to CSV).

Format resolution

If the parse or format stream functions can't resolve the format argument, then they will look for a module by the same name with two top-level functions: format.createReadStream([options]) and format.createWriteStream([options]), respectively. This allows "pluggable" formats and makes wrapping existing APIs much simpler. (There should also be a simple way to specify per-format option name mappings so that you can use, say, -d as a shorthand for --delimiter.)

CLI

The tito command line interface is wonky. Here is a better way to do it.

It should infer formats from positional arguments in the form:

tito input.format output.format

(If no output filename is provided, use stdout. If no input filename is provided, use stdin. The value - can be used to represent either, respectively.)

It should use subarg or similar to parse options for the read and/or write formats:

# e.g. for reading from stdin
tito --read [ format --some option ]
# or provide options for an inferred format
tito --read [ --some option ] input.format

Examples

# convert a JSON file to CSV
tito --read [ --path 'results.*' ] results.json data.csv

# read CSV from stdin and write to newline-delimited JSON (the default i/o format)
cat data.csv | tito --read csv > data.ndjson

# convert pipe-delimited values on stdin to tab-separated values
some-program | tito --read [ csv --delimiter '|' ] - data.tsv
shawnbot commented 8 years ago

This might be all that the tito CLI really needs to do:

var subarg = require('subarg');
var argv = subarg(process.argv.slice(2));

var input = qualify(argv._[0], '/dev/stdin', argv.read || {});
var output = qualify(argv._[1], '/dev/stdout', argv.write || {});

var tito = require('tito');
var read = tito.createReadStream(input.filename, input.format, input.options);
var write = tito.createWriteStream(output.filename, output.format, output.options);

read.pipe(write);

function qualify(filename, fallback, options) {
  var format = 'ndjson';
  if (!filename || filename === '-') {
    filename = fallback;
  } else if (filename.indexOf('.') > -1) {
    format = filename.substr(filename.lastIndexOf('.') + 1);
  }
  if (options._[0]) {
    format = options._[0];
  }
  return {
    filename: filename,
    format: format,
    options: options
  };
}
shawnbot commented 8 years ago

Also, look into using coffee to test the CLI.