How to handle operation queuing and error handling/recovery robustly

SerialPort Version: 6.1.1
NodeJS Version: 8.9.4
Operating System and Hardware Platform: Windows 10 x64 (Electron)
Electron Version: 1.8.4
Have you checked the right version of the api docs?: yes
Are you having trouble installing and you checked the Installation Special Cases docs? no
Are you using Electron and have you checked the Electron Docs?: yes

Summary of Problem

I am wrapping the API with a Promise-oriented approach in order to fit in with the rest of my code. I wrap each method I need in a Promise, hooking into the callback, and using the presence of the error parameter to determine whether or not I call resolve or reject. This works rather well in most cases, but starts to fall apart when fatal errors and operation queuing occur.

For instance, I attempted to flush() before every single write() call, which on Windows causes errors during the write() (https://github.com/node-serialport/node-serialport/issues/1409). Now, I will be commenting this line out for now, but it makes me nervous about relying on the callbacks since it causes unexpected queuing. For instance, consider this hypothetical calling code of my Promise-style wrapper API:

await flush();
await write("test\n");
await drain();

The bug with flush() will cause an error in write() which will cause the port to be closed. When write() returns and drain() is called, it is simply queued up forever until the port is re-opened. While I can understand that queuing is desired behavior when the user closes the port manually via close() or before open() actually occurs, this is very undesirable behavior when the port is closed by some underlying error.

Ideally, there would be some setting on the object that would allow me to turn off this behavior altogether. I would very much prefer that if the port is closed, any operation would fail-fast and trigger the error callback saying so (like it does with write() for instance). For now, my two ideas to get around this are as so:

1) Listen for close events and always try to re-open:

port.on("close", () => port.open());
// then, to really close, I would use EventEmitter.removeAllListeners("close")
// before actually closing in my wrapper close() method

2) Use a timeout for all operations. I haven't tried this one yet, but I imagine I would use Promise.race and a Promise-wrapped setTimeout to have any operations time out if the callback isn't called quick enough.

Ultimately, neither of these solutions seem very robust to me, and not knowing whether or not the callbacks are going to be called in all cases makes me very nervous about using the library, so I guess I would like the following clarifications:

1) What are all of the cases that the callbacks may not be called? What would be a good way to get around that so that no operation is left in a forever-pending state?

2) Of the above two fixes, which would make more sense in the context of how the library was designed? And should I expect to see a flag like fail-fast or similar in the future?

3) As a side note, as long as I pass a callback to every method call, can I safely ignore the error event? Or should I register a handler to cache extra error events and pass them to the next method that is called via its callback?

As a side node, I am overall very happy with the library, thanks for all the hard work! :)

Sorry for the late reply, I think you bring up a bunch of good points on our api design. I'm actionable this a docs label and we should try to pull actionable changes or documentation out of it.

Streams are annoying and the state of them is hard to manage, in the near future I hope to have a few other options available that will provide more robust apis.

No worries on the late reply. I have been using the first workaround I mentioned above with success up to this point:

port.on("close", () => port.open());
// then, to really close, I would use EventEmitter.removeAllListeners("close")
// before actually closing in my wrapper close() method

This should work just fine for now, but here would be my idea to add the fail-fast flag:

1) Add a member to openOptions called failFast or similar (having it default to whichever behavior makes the most sense) 2) Track down all instances of if (!this.isOpen) or similar lines that then queue up work if the port is closed 3) Add a bit of code inside each check like:

if (!this.isOpen)
{
    if (this.failFast)
        // call error callback with message indicating that the port is closed
    else
        // current queuing behavior
}

Assuming that fatal errors always end up in a closed port, I think this would suffice for fail-fast behavior.

Though I'm not super familiar with the source code, I could probably try to make these changes locally and do some testing to see how it works out, and then potentially submit a pull-request.

This api is a little different, what do you think?

https://github.com/node-serialport/node-serialport/pull/1850

Looking good! Promises from the get-go will definitely be a welcome change. I personally feel that Async-Iterables are a bit too "bleeding edge" (and prefer Observables for a "streaming" experience), but that is probably because I haven't played around with them very much to understand their benefit vs other options.

What are your plans for error handling with a Promise-based API? I personally would like to see all Promise operations fail-fast when the port is in some bad state (closed or not configured), and not have operations get queued (at least not by default). However, there are a lot of people who would also like some sort of centralized way to consume errors from the port (via some port.Errors stream/event or similar). I think also replicating errors to such a stream (in addition to the Promise) would probably work well.

Something like:

const { open, list } = require('@serialport/async-iterator')
const ports = await list()
const arduinoPort = ports.find(info => (info.manufacture || '').includes('Arduino'))
const port = await open(arduinoPort)

// Log all errors
port.on("error", error => console.log("Serial port error: " + error.toString()));

try
{
    console.log("The first byte was: " + await port.next(1));
}
catch (error)
{
    // deal with specific error
}

This sort of functionality (at least in my own experience), is typically accomplished by:

1) Caching any errors from the underlying native objects (in this case the OS serial port or connection), since they are often observed asynchronously. 2) Whenever a user-level API function is called on the port, check the error cache and fail-fast if there is an error. Otherwise, try to accomplish the operation (and fail if an error occurs, which could also involve checking the error cache again). 3) If there are any handlers to the catch-all error event, then also call those handlers with the error that is produced on any given Promise for some API function call.

The idea is that the error event is only really for things like logging, and the primary way to handle specific failures should be at the call-site via the Promise itself.

For operation queuing (if enabled), the above should still work, but instead of failing-fast, the Promise would keep retrying the operation until it succeeded (instead of failing-fast), though this would probably end up needing some sort of cancellation mechanism to avoid Promises that never return for a long time.

Observables have no back pressure but for ux I do see the appeal. Async iterators (specifically for await...of) are in all major browsers and nodejs. ES streams are another approach but I find them complex.

I would consider a global close event as errors will close the port. Unsure about a global error event as specific functions will always reject their errors.

I haven’t thought about a queuing system. Async iterators will queue and “fail fast” reads. Queuing writes should probably do the same. Other operations can fail fast on closed ports but may have undefined behavior for operations in fight.

Hello, did anything ever come of this? I'm converting a web page into an electron app and I am looking to have some error handling in my app, in the event that serial comms go down. But I am still very confused about how to do that.

Is parser-inter-byte-timeout the only included code which deals with the issue of timeouts? I'm using a simple text line-based protocol where all communications are request/response pairs (using parser-readline).
One initial idea was to pipe parser-inter-byte-timeout into parser-readline, but I'm not even sure that makes sense. When it times out, then would it have to inject endline characters for the parser-readline to trigger?

@SirUppyPancakes did you end up getting anything to work using your "Promise.race and a Promise-wrapped setTimeout" idea? I'm still pretty new to Promises, and having a hard time picturing how to fit such a thing together. A demo like that would be invaluable to me.

Just getting an ajax-like request/response paradigm working in electron over IPC was quite a struggle for me, and the result still feels like a terrible kludge. I have no idea how to shoehorn a Promise timeout race in there.

// in renderer.js
function parse_status(response) {  
  /* update presentation in renderer based on response */  
}
function update_status() {
  window.serial.Tx("status", parse_status);
}
document.addEventListener('DOMContentLoaded', () => {
  // ...
  // Initialize serial connection(s)
  window.serial.connect().then(() => {
    setInterval(update_status, 1000); // Set up recurring status check
  });
  // ...
}

// src/preload.js
const { contextBridge, ipcRenderer } = require('electron')

function noop() { }

contextBridge.exposeInMainWorld('serial', {
  connect: () => ipcRenderer.invoke('serial-connect'),
  Tx: (command, callback, device) => {
    // set up an individual receive handler for every transmitted message
    ipcRenderer.removeAllListeners('serial-Rx');
    ipcRenderer.once('serial-Rx', callback ? (event, response) => { callback(response) } : noop);
    ipcRenderer.send('serial-Tx', command, device);
  },
})

// src/main/main.js
const { app, process, screen, ipcMain, BrowserWindow } = require('electron')
const serial = require('./serial.js')

app.whenReady().then(() => {
  ...
  // serial.Tx needs webContents (from event.sender) in order for parser-readline to call an appropriate handler on data.
  ipcMain.handle('serial-connect', () => serial.connect() );
  ipcMain.on('serial-Tx', (event, command) => { serial.Tx(command, event.sender); });
 ...
})

// src/main/serial.js
// Serial handling code, running on the Main process
const { SerialPort } = require('serialport')
const { ReadlineParser } = require('@serialport/parser-readline')
const Logger = require('./logging.js')
let connections = {};

// Problem: only one message/request should be allowed "in transit" at a time
//   transmit message ... receive response, handle response, repeat
async function connect() {
  if (connection && connection.serial && connection.serial.isOpen) {
    return; // port already set up, do nothing
  }
  await SerialPort.list().then(ports => {
    for(const port of ports) {
      if (port.path.startsWith("/dev/ttyACM")) {
        console.log(`Initialize serial connection: ${port.path}`);
        const serial = new SerialPort({ path: port.path, baudRate: 115200 });
        const parser = serial.pipe(new ReadlineParser({ delimiter: '\r\n' }));
        connection = {
          path: port.path,
          serial,
          parser,
          command: null,
          webContents: null,
        };
        parser.on('data', Rx);
      }
    }
  })
}

function Tx(command, webContents) {
  console.log(`serial.Tx('${command}')`);
  const port = connection.serial;
  const txStart = Date.now();
  port.write(Buffer.from(` ${command} \r\n`));
  port.drain(); // wait until sent
  Object.assign(connection, { command, webContents, txStart, txEnd: Date.now() });
}

function Rx(data) {
  console.log(`serial.Rx('${data}')`);
  // Have renderer update the UI from response data.
  if (connection && connection.webContents)
    connection.webContents.send("serial-Rx", data);
}

module.exports = {
  "connect": connect,
  "Tx": Tx
};

For what its worth, my previous implementation was a web page making ajax calls to apache-hosted python cgi that used pySerial. pySerial has a number of timeout options that are configured upon initializing/opening the port

timeout (float) – Set a read timeout value in seconds.
write_timeout (float) – Set a write timeout value in seconds.
inter_byte_timeout (float) – Inter-character timeout, None to disable (default).

Is there any possibility that node-serialport could configure its port in a similar way?

@thehans Here is what the Promise.race-based timeout that I described would look like:

// Wrap setTimeout in a promise
Promise.delay = timeMS => new Promise(resolve => setTimeout(resolve, timeMS))

// Wait asynchronously for 5 seconds:
await Promise.delay(5000)

// Add a function to all Promise instances to wrap them with a timeout
Promise.prototype.orTimeout = function(timeMS) {
    return Promise.race([
        this,
        Promise.delay(timeMS).then(() => {
            throw new Error("operation timeout") // could replace this with your own Error type if you want more specific errors
        })
    ])
}

// Run and asynchronously wait for someLongAsyncOperation() like normal, but if 5 seconds pass, throw an Error
await someLongAsyncOperation().orTimeout(5000)

The only thing you need to be mindful of with this approach, which was probably the reason why I didn't go this route before, is that the actual underlying Promise will still run and isn't magically cancelled or anything like that. For example:

async function reallyLongOperation() {
    console.log("operation started")
    await Promise.delay(100 * 1000) // 100 seconds
    console.log("operation finished")
}

// Use the timeout to only wait 5 seconds instead of 100:
await reallyLongOperation().orTimeout(5000)

The result of this will be operation started is printed and then after 5 seconds the call will return and throw the timeout Error, and you program can continue. However, the rest of reallyLongOperation will still run in the background, so in this case operation finished will print after 95 more seconds. In the context of sharing a native resources like a serial port, this could cause some weird behavior if the operations which you're adding timeouts to randomly continue after your program has already moved on. Ultimately, if the code you're working with doesn't incorporate timeouts as first-class functionality, then no amount of wrapping with your own timeout code can truly achieve the same level of functionality, just an approximation like this which comes with those sort of tradeoffs.

Best of luck!

serialport / node-serialport

How to handle operation queuing and error handling/recovery robustly #1537

Summary of Problem