sjdv1982 / seamless

Seamless is a framework to set up reproducible computations (and visualizations) that respond to changes in cells. Cells contain the input data as well as the source code of the computations, and all cells can be edited interactively.
http://sjdv1982.github.io/seamless
Other
20 stars 6 forks source link

Nested transformation improvements #217

Open sjdv1982 opened 1 year ago

sjdv1982 commented 1 year ago

seamless.direct.transformer wraps a function inside a DirectTransformer object, which launches direct transformations (seamless.direct.Transformation) when called. In addition, direct transformations can also be created from unbound high-level Transformer objects (Transformer.get_transformation). Nested transformation is when direct transformations are created inside an existing transformation.

There are two kinds of nested transformation: local and non-local (delegated). By default, DirectTransformer objects have local=None, meaning that delegated nested transformation is tried first. Local nested transformation is then used as a fallback.

Local nested transformation already works. After the transformation a been launched in a forked seamless.core.execute.execute call, the forking modulates any subsequent call involving the seamless.direct.run machinery. Namely, seamless.direct.run will now forward local nested transformation calls to the parent process via a parent process queue. Some improvement may be needed, because currently, all calls are queued up until any call is waited for, causing all calls to be launched-and-waited-for only then.

Non-local nested transformation means that an assistant must be available inside the transformer. This doesn't work for any of the current assistants (micro, mini or mini-dask). This will be a bit complicated in cases where the assistant lives on a user machine whereas the job is executed on a cluster. Barring some kind of reverse tunneling or websockets, one solution for dask-based execution is to make a "in-process assistant" as a thin wrapper around the Dask scheduler (which is by necessity available for each worker). Add to the assistant protocol a "release lock"/"acquire lock" APl. For the Dask in-process assistant, theses will be simple wrappers around Client.secede() and Client.rejoin().

sjdv1982 commented 1 year ago

There is now an InProcessAssistant class.

sjdv1982 commented 11 months ago

Instead of communicating to the Dask scheduler, a worker could also try reach the Dask client inside the original assistant. In that case, use the same Dask mechanism as https://github.com/sjdv1982/seamless/issues/219, and probably store an ID that identifies the original assistant (since multiple assistants can connect to the scheduler).

sjdv1982 commented 9 months ago

See also #241

sjdv1982 commented 6 months ago

Non-local nested transformation now works for a "local" mini assistant (devel), by setting the Seamless assistant IP address to "localhost" inside the mini-assistant-devel Docker image.

(the meaning of "local" is getting a bit confused here!)