sjdv1982 / seamless

Seamless is a framework to set up reproducible computations (and visualizations) that respond to changes in cells. Cells contain the input data as well as the source code of the computations, and all cells can be edited interactively.
http://sjdv1982.github.io/seamless
Other
20 stars 6 forks source link

Handle big files generated by bash transformers #235

Open sjdv1982 opened 10 months ago

sjdv1982 commented 10 months ago

For bash/bashdocker transformers that generate more data than fits in memory, adapt the bash/bashdocker transformer executors. What seamless.cmd already does is to set the hash pattern of the output pin to deepfoldercell. To be done: firstly, add a facility to buffer_remote.py to upload a file (not a in-memory buffer) in chunks to the buffer server, giving back a hash, with low memory consumption. Secondly, add a flag to seamless so that a transformer can tell (forked) seamless that the hash calc has been done already, no need for packing. Thirdly, adapt the executors to do this. This is primarily for running bash under Seamless: A slurm-assistant can do the right thing much easier.

sjdv1982 commented 10 months ago

Also see #192