Seamless is a framework to set up reproducible computations (and visualizations) that respond to changes in cells. Cells contain the input data as well as the source code of the computations, and all cells can be edited interactively.
When submitting thousands of jobs, there could be many /bin/seamless processes running. Right now, this runs into two bottlenecks
(see tests/cmd/manyjobs.sh for the code):
/bin/seamless imports seamless, which takes 0.5-1s, using full CPU time, and parallellizing poorly (Python import is I/O bound?). This means that one can launch only ~5 /bin/seamless jobs per second, else the CPU gets overwhelmed.
After initial import, /bin/seamless will poll the assistant, and barely use any CPU. However, it still uses ~50 MB of memory. This means that ~300 /bin/seamless processes is the maximum that can run simultaneously, even if all the jobs are remote.
The new "SEAMLESS_FRUGAL" feature has improved the situation somewhat, but not enough.
When submitting thousands of jobs, there could be many
/bin/seamless
processes running. Right now, this runs into two bottlenecks (seetests/cmd/manyjobs.sh
for the code):/bin/seamless
importsseamless
, which takes 0.5-1s, using full CPU time, and parallellizing poorly (Python import is I/O bound?). This means that one can launch only ~5/bin/seamless
jobs per second, else the CPU gets overwhelmed.After initial import,
/bin/seamless
will poll the assistant, and barely use any CPU. However, it still uses ~50 MB of memory. This means that ~300/bin/seamless
processes is the maximum that can run simultaneously, even if all the jobs are remote.The new "SEAMLESS_FRUGAL" feature has improved the situation somewhat, but not enough.
Potential solutions:
/bin/seamless
off the main codebase/bin/seamless
to a daemon-client model