wintercg / proposal-common-minimum-api

https://common-min-api.proposal.wintercg.org/
Other
216 stars 15 forks source link

Common I/O (stdin/stdout/stderr) module specification #47

Closed guest271314 closed 3 months ago

guest271314 commented 12 months ago

Take a look at these Native Messaging hosts written in JavaScript; Node.js, Deno, Bun, QuickJS, txiki.js. They implement reading stdin and writing stdout differently.

The last time I check d8 (V8) and jsshell (SpiderMonkey) provide no means to read stdin and write stdout using TypedArrays (buffers).

Node.js does not write more than 65536 to stdout without process.stdout._handle.setBlocking(true), at least not during my testing; Deno, Node.js, Bun; txiki.js all require multiple reads to read 1 MB from stdin after reading the first 4 bytes, QuickJS reads the full 1 MB in one read.

A common stdin/stdout.stderr module that can be imported (CommonJS, Ecmascript Modules, whatever) and is capable of assuming responsibility of writing string or buffer at author/application discretion will be very helpful as a common specification that can be implemented for JavaScript implementations.

guest271314 commented 7 months ago

I did a little write up on this JavaScript Standard Input/Output: Unspecified.

mk-pmb commented 7 months ago

I'm not sure I understand the part

Each message is serialized using JSON, UTF-8 encoded

will this allow reading binary data? Because at least node.js totally messes up at trying to read my simple image file:

$ <<<'UDYKMiAyCjI1NQr///////////////8=' base64 -d >white.ppm
$ file white.ppm
white.ppm: Netpbm image data, size = 2 x 2, rawbits, pixmap
$ <white.ppm nodejs -p '
  process.stdin.setEncoding("binary");
  var input = require("read-all-stdin-sync")();
  Buffer.from(input).toString("base64")'
UDYKMiAyCjI1NQrvv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv70=

So each time I want to run a JS-coded image algo, I have to first put a base64 encoder in front in the shell pipeline and then a base64 decoder wrapper script as the node.js program. Maybe this would be an opportunity to fix the mess.

guest271314 commented 7 months ago

Each message is serialized using JSON, UTF-8 encoded

The Native Messaging protocol, which I used to illustrate the differences between STDIO implementation of JavaScript runtimes, how sends the length of the message, then the message.

In JavaScript we typically read that length by writing the first 4 bytes of stdin to a Uint32Array, e.g., using node executable

import { open } from "node:fs/promises";
// Read all bytes 
async function readFullAsync(length, buffer = new Uint8Array(65536)) {
  const data = [];
  while (data.length < length) {
    const input = await open("/dev/stdin");
    let { bytesRead } = await input.read({
      buffer
    });
    await input.close();
    if (bytesRead === 0) {
      break;
    }
    data.push(...buffer.subarray(0, bytesRead));  
  }
  return new Uint8Array(data);
}

async function getMessage() {
  // Extract message length
  const header = new Uint32Array(1);
  await readFullAsync(1, header);
  // Read data up to and including message length
  const content = await readFullAsync(header[0]);
  return content;
}

So each time I want to run a JS-coded image algo, I have to first put a base64 encoder in front in the shell pipeline and then a base64 decoder wrapper script as the node.js program. Maybe this would be an opportunity to fix the mess.

There are ways to process (any) data without using base64.

A Uint8Array representation of the data which you can get various ways can be spread to an Array and serialized to JSON, then streamed in chunks of [0, 255, ...] then reconstructed to a Uint8Array and further written to a real-time stream.

What I propose here is for maintainers of JavaScript engines and runtimes and the JavaScript specification itself via ECMA-262 write out

Something like

let buffer = new Uint32Array(1);
std.read({buffer, sync: true});
let data = new Uint8Array(buffer[0]);
await std.read({buffer: data, async: true});
let len = new Uint8Array(new Uint32Array([message.length]).buffer);
await std.write({async: true, buffer:len});

For those classes to be implemented uniformly by JavaScript engines and runtimes for the ability to write the same STDIO code that runs the same in JavaScript engines and runtimes.

Right now that's not possible. We have to write different STDIO for each JavaScript runtime.

Typically the above can take the form of a Uint8Array. However, that's not the only way to process data. An ArrayBuffer could be used. WHATWG Streams could be used. All of the above can be used. The details can be sorted out. Options can be thought about and included in the deliverable. We are only talking about STDIO.

That's might be an omission that is not easily observable for people who are just running node, or just running deno. I experiment and test multiple JavaScript runtimes and engines, including but not limited to, node, deno, bun, qjs, tjs and V8's d8 and SpiderMonkey's js, among others.

Circa 2023 there are more JavaScript engines and runtimes that do not target the browser than JavaScript runtimes that do target the browser.

However, there is no compatibility, no uniformity.

If you hack multiple JavaScript engines and runtimes that omission glares. At least it does to me when writing code intended to be used commonly among JavaScript engines and runtimes.

mk-pmb commented 7 months ago

So with the Uint8Array read example and my file above, the second half of items in the array should turn out as 255 then?

guest271314 commented 7 months ago

No. That's just an example that shows the lowest integer and highest integer that a Uint8Array will have as an element. See TypedArrays .

jasnell commented 3 months ago

There's no action to take here.