joshkatz / r-script

A simple little module for passing data from NodeJS to R (and back again).
MIT License
161 stars 67 forks source link

callback in async call called to often #12

Open WegDamit opened 7 years ago

WegDamit commented 7 years ago

It seems the callback R(RscriptName) .data({foo: bar} ) .call(callbackRscript) then callbackRscript seems to be called for each line of R output. I'd expected the callback is called once R is done and then with one parameter with all the lines from R output.

amcereijo commented 6 years ago

@joshkatz

Hi, I was trying to run a R script using call(..) and it doesn't work well because of its implementation.

R.prototype.call = function(_opts, _callback) {
  var callback = _callback || _opts;
  var opts = _.isFunction(_opts) ? {} : _opts;
  this.options.env.input = JSON.stringify([this.d, this.path, opts]);
  var child = child_process.spawn("Rscript", this.args, this.options);
  child.stderr.on("data", callback);
  child.stdout.on("data", function(d) {
    callback(null, JSON.parse(d));
  });
};

The spawn method runs asynchronously and should listen for events data and close. With data you get any output produced by the running script and with close, you know the execution is over.

In the implementation, the events data on stderr and stdout are calling to the callback in the very first output produced by the running script instead of cache the data and return it when the execution is over.

Many R script write "debug" or "warning" data during its execution. In my case, for example I can see things like

Attaching package: 'lubridate'

The following object is masked from 'package:base':

    date

Attaching package: 'data.table'

The following objects are masked from 'package:lubridate':

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

So the implementation should change to wait for the close event. A suggested changed implementation could be:

function getReturnData(d) {
  try{
    return JSON.parse(d);
  } catch(err) {
    return d;
  }
}

R.prototype.call = function(_opts, _callback) {
  var callback = _callback || _opts;
  var opts = _.isFunction(_opts) ? {} : _opts;
  this.options.env.input = JSON.stringify([this.d, this.path, opts]);
  var child = child_process.spawn("Rscript", this.args, this.options);
  var errData = '';
  var outData = '';

  child.stderr.on("data", function(d) {
    errData += d;
  });

  child.stdout.on("data", function(d) {
    outData += d;
  });

  child.on('close', (code) => {
    if(errData) {
      callback(getReturnData(errData));
    } else {
      callback(null, getReturnData(outData));
    }
  });
};

I was trying that implementation and the R scripts run async with no problems.

I could make an Pull Request if you prefer.

Regards,