sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud
https://CoCalc.com
Other
1.14k stars 207 forks source link

API for async project exec #7666

Open haraldschilly opened 1 week ago

haraldschilly commented 1 week ago

There is an project_exec in the v1 api … but, it became apparent there is a need to make it possible to run asynchronously. I.e. a job (command + args) is launched, a unique ID is returned, and later the callee can query by job ID to check the status and get the returned stdout+stderr. (status codes I can think of: "running" | "error" | "missing" | "success").

My idea how to implement this is by letting the project spawn a sub-process (child_process), but to not block on it. Instead, it stores the spawned process internally in a map, where the key is just an UUID or a simple number. In any case, with that, it should be possible to check up on a spawned sub-process by ID. Results & status are stored in memory.

The more complex detail is how to properly route the request from the hub to the project. My feeling is, it isn't too hard to extend whatever project_exec does right now, but I haven't looked at the code. So, maybe the hard part is already in place? Just needs a bit of "high level" treatment to fit into the v2 framework?

Concerns about memory usage:

As an extra, I think it would also be cool to add some information about the process, similar to /usr/bin/time in linux. I'm thinking of e.g. the output of process.resourceUsage … such that callers know a bit more about what was going on inside the project.

williamstein commented 1 week ago

FYI, I think this will be relatively easy to implement, and the hard part is fortunately mostly already done. Deciding on all those parameters is very, very helpful.

What's your motivation for this?