API for async project exec

There is an project_exec in the v1 api … but, it became apparent there is a need to make it possible to run asynchronously. I.e. a job (command + args) is launched, a unique ID is returned, and later the callee can query by job ID to check the status and get the returned stdout+stderr. (status codes I can think of: "running" | "error" | "missing" | "success").

My idea how to implement this is by letting the project spawn a sub-process (child_process), but to not block on it. Instead, it stores the spawned process internally in a map, where the key is just an UUID or a simple number. In any case, with that, it should be possible to check up on a spawned sub-process by ID. Results & status are stored in memory.

The more complex detail is how to properly route the request from the hub to the project. My feeling is, it isn't too hard to extend whatever project_exec does right now, but I haven't looked at the code. So, maybe the hard part is already in place? Just needs a bit of "high level" treatment to fit into the v2 framework?

Concerns about memory usage:

Another detail is how to deal with too much output. Probably the best idea is to cap stdout and stderr, e.g. at 1 MB, and only keep the remainder of the output (aka "tail").
There should also be an overall limit to how many such outputs are stored, and the stored output should be deleted from memory, once it has been returned by an API call.

As an extra, I think it would also be cool to add some information about the process, similar to /usr/bin/time in linux. I'm thinking of e.g. the output of process.resourceUsage … such that callers know a bit more about what was going on inside the project.

sagemathinc / cocalc

API for async project exec #7666