nkremerh / sugarscape

Development repository for the Digital Terraria Lab implementation of the Sugarscape agent-based societal simulation.
https://github.com/digital-terraria-lab/sugarscape
MIT License
7 stars 11 forks source link

Changes data/run.py to use multiprocessing library instead of bash #87

Closed colinhanrahan closed 2 months ago

nkremerh commented 2 months ago

Two additional pieces:

Is it possible to replicate the behavior of the bash script more closely? Key difference: the job update interval is now being used to print out progress on submitting jobs rather than printing how many jobs are left at the end. The preferred outcome is the latter, with every submitted job printed and a countdown provided at the end.

When sending a termination signal (CTRL + c), it would be preferable to not see a huge wave of Python KeyboardInterrupt runtime errors. There are ways to handle either the stdout and stderr of a program (such as sending it to /dev/null which may be too UNIX-like for Windows environments) or to catch the termination signal and handle it more gracefully.

nkremerh commented 2 months ago

Closer. The currently running simulations should terminate when interrupted.

colinhanrahan commented 2 months ago

Closer. The currently running simulations should terminate when interrupted.

pool.terminate() terminates all processes in the pool. Also, the jobs remaining countdown added some unpleasant complexity. You have to share variables across processes.

nkremerh commented 2 months ago

It should, but it does not.

Closer. The currently running simulations should terminate when interrupted.

pool.terminate() terminates all processes in the pool. Also, the jobs remaining countdown added some unpleasant complexity. You have to share variables across processes.

It should, but it does not.

colinhanrahan commented 2 months ago

@nkremerh How about now? Edit: looks like it didn't work from my testing. Should we be using multithreading or multiprocessing?

nkremerh commented 2 months ago

Let me think on it. Since you're about to take on bigger Kanban cards, let me take ownership of this one. Please keep this branch alive while I work on getting it touched up and merged. The core logic is there, so there are likely just a few architectural changes I need to make.

colinhanrahan commented 2 months ago

The impression I'm getting is that os.system is spawning its own subprocesses that aren't tracked by the Pool, so you'll need to keep track of them individually to terminate them. I was looking into subprocess.Popen and ThreadPoolExecutor if you're interested.

nkremerh commented 2 months ago

That would make sense. Good intuition!

nkremerh commented 2 months ago

@colinhanrahan Please review.

A note: interrupting the jobs kills all the processes in the pool now, and only the main process (run.py) complains. This is an acceptable compromise for me as similar behavior happens if you terminate the GUI window early and other edge cases of interrupting the simulation.

colinhanrahan commented 2 months ago

@nkremerh

Other things I found:

nkremerh commented 2 months ago

I originally included a sleep in the busy waiting, but that led to job count updates getting skipped (as you noted).

The one-liner configs are minified for file size reasons. I'm fine with them not being pretty printed.

Fixed the bug by simply trusting if the user leaves unfinished jobs in the data directory and runs make data, then they want those jobs run.

colinhanrahan commented 2 months ago

I pushed some small changes, but otherwise looks good. Let me know if I broke anything.