Open roed314 opened 1 year ago
Here's a proof of concept (terrible code, but you get the idea, first without a fork, second with the forking server; 4.8s vs 0.8s vs 55ms without IPython, not shown in the video):
https://user-images.githubusercontent.com/373765/217391597-5eeb58d4-ece6-4baa-aaad-0158310788fc.mp4
server:
import sage.all
import os
import sys
from pickle import loads
fifo = '/tmp/sage.server'
try:
os.mkfifo(fifo)
except:
print("not creating fifo")
while True:
with open(fifo, 'rb') as requests:
request = requests.readline()
pid = os.fork()
if pid:
continue
stdin, stdout, stderr = loads(request.strip())
print("opening terminals", stdin, stdout, stderr)
sys.stdin.close()
sys.stdout.close()
sys.stderr.close()
sys.stdin = open(stdin, 'r')
sys.stdout = open(stdout, 'w')
sys.stderr = open(stderr, 'w')
from sage.misc.banner import banner
banner()
from sage.repl.interpreter import SageTerminalApp
app = SageTerminalApp.instance()
app.initialize()
app.start()
break
client:
import os
pid = os.getpid()
stdin = os.readlink(f'/proc/{pid}/fd/0')
stdout = os.readlink(f'/proc/{pid}/fd/1')
stderr = os.readlink(f'/proc/{pid}/fd/2')
from pickle import dumps
request = dumps((stdin, stdout, stderr))
fifo = '/tmp/sage.server'
from pathlib import Path
Path(fifo).write_bytes(request + b'\n') # sorry
import time
time.sleep(2) # once the time's up, the shell prompt and the sage prompt mix
(I had to change the encoding of the request because the output of dumps()
contained newlines.) I was pleasantly surprised to discover that import sage.all
appears to create little out-of-process global state that would end up being shared between the forks. I see no child process, no (writable) shared memory maps... the most suspicious thing I noticed is the call to lazy_import.save_cache_file()
. strace
show one other temporary file being written, but the file is closed+unlinked right away, so I guess it is harmless. Do you know if there is anything else?
Heh, I wrote a forking sage server that might be much like you describe about 10 years ago (!), and it's been in active production use on CoCalc ever since, so it's not totally broken. Here's the code:
https://github.com/sagemathinc/cocalc/blob/master/src/smc_sagews/smc_sagews/sage_server.py
It does deal with a lot of subtle issues that matter for cocalc, but which might not matter for you. There's a big list of libraries and modules it imports at the start -- e.g., all the plotting stuff, scipy, etc. -- maybe that's different than what you need.
This is used in sage so that once you start one sage worksheet, creating any more is MUCH faster with instant startup, since it's just forking an existing process. This is of course how Jupyter notebooks should work, but in practice they don't work this way at all. Sage worksheets do.
Anyway, feel free to look at that code. Don't be afraid of the AGPL license; I have all the copyright, and can relicense it. Also, there as a BSD copy of exactly that code at some point in time (that was needed to get it out of Univ of Washington).
There's some discussion related to this at https://bugs.python.org/issue34296 (tl;dr nothing is implemented in a generic way anywhere; there's another PoC at https://github.com/cykerway/pyforkexec.)
I put a demo of a generic forking server at https://github.com/saraedum/forsake. There's a SageMath demo there but it's not limited to SageMath.
@saraedum and I were bemoaning the slow startup time of Sage, and how it makes it difficult to use Sage from gnu parallel when running large scale computations on a server. Here's a sketch of a method for being able to get a copy of Sage running very quickly for this use case.
spawn_forks
that writes its pid to a fixed file and then just sits in a while loop listening for requests on some socket. When it gets a request, it forks, returning to the while loop in the parent process and exiting to a running Sage in the child process. When thespawn_forks
function is exited in the parent process (viaKeyboardInterrupt
for example), delete the fixed file.spawn_forks
in it. Either way, send a message to the spawner (with the stdout, stdin, stderr file descriptors, the current working directory, sys.argv, maybe non-Sage environment variables) and grab a fork.$SAGE_ROOT/bin/sage-ipython
); otherwise execute the file that's been passed in (as in$SAGE_ROOT/bin/sage-run
).Here's an example demonstrating how we can get the relevant file handles in python.