Closed steipatr closed 1 year ago
Something did change in repeat_report
in pynetlogo. I moved to using tempfiles from files whose filename was based on the name of the outcome of interest. What I find strange is that different, separately spawned processes end up using the exact same tempdir and tempfile. If this is a problem in pynetlogo, then it should be fixable using either a prefix or suffix based on e.g. the process id.
In contrast, the ema workbench example relies on a custom multiprocessing evaluator, which sets up explicitly separate working directories for each process. Moreover, how the command string is built is a bit more involved.
First of all, I appreciate your support on all things EMA and pynetlogo. Thank you for your quick replies and insightful suggestions.
I tried adding the process ID (with os.getpid()
) as a suffix to both mkdtemp
and mkstemp
in repeat_report
to no avail:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\stein042\.conda\envs\robustresilience39\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\stein042\.conda\envs\robustresilience39\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Local\Testing\predprey_salib_multiprocessing.py", line 47, in run_simulation
counts = netlogo.repeat_report(["count sheep", "count wolves"], 100)
File "C:\Users\stein042\.conda\envs\robustresilience39\lib\site-packages\pynetlogo\core.py", line 553, in repeat_report
os.remove(value)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Local\\Testing\\i8f_xyod-10180\\tmpme8gp6ok-10180.txt'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Local\Testing\predprey_salib_multiprocessing.py", line 94, in <module>
for entry in executor.map(run_simulation, experiments.to_dict("records")):
File "C:\Users\stein042\.conda\envs\robustresilience39\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\stein042\.conda\envs\robustresilience39\lib\multiprocessing\pool.py", line 771, in get
raise self._value
PermissionError: [Errno 13] The process cannot access the file because it is being used by another process: 'C:\\Local\\Testing\\i8f_xyod-10180\\tmpme8gp6ok-10180.txt'
I also tried commenting out the temporary folder and file cleanup stuff in repeat_report
. After which the code ran fine, although I also had to change around some output processing in https://github.com/quaquel/pyNetLogo/blob/master/examples/predprey_salib_multiprocessing.py#L50, from:
results = pd.Series(
[counts["count sheep"].values.mean(), counts["count wolves"].values.mean()],
index=["Avg. sheep", "Avg. wolves"],
)
return results
to:
results = pd.Series(
[counts["count sheep"].mean(), counts["count wolves"].mean()],
index=["Avg. sheep", "Avg. wolves"],
)
due to error "numpy.ndarray object has no attribute values".
So perhaps the issue is somewhere in the temporary folder/file cleanup? Might be as simple as an identation error. Will dig around some more.
ok, so this is clearly a pynetlogo rather than an EMA bug, and from the looks of it Windows specific.
Yes I agree. I think that at owner level, it's possible to move issues between repos, perhaps this should be continued in the pynetlogo repo.
I learn something new everyday.
The plot thickens. File handles/descriptors are somehow being shared across processes. This might cause a mixup when os.close
is called.
EDIT: Nevermind, the only thick thing here is me.
File handles (fhs) for Process 14868: [4, 5]
File names (fns) for Process 14868: {'count sheep': 'C:\\Local\\Testing\\aucn0836-14868\\tmp80m5l9mp-14868.txt', 'count wolves': 'C:\\Local\\Testing\\aucn0836-14868\\tmpy7p0_bjd-14868.txt'}
Process 14868 should now os.remove the filename C:\Local\Testing\aucn0836-14868\tmp80m5l9mp-14868.txt
Process 14868 should now os.remove the filename C:\Local\Testing\aucn0836-14868\tmpy7p0_bjd-14868.txt
Process 14868 should now os.close the file handle 4
Process 14868 should now os.close the file handle 5
File handles (fhs) for Process 9824: [4, 5]
File names (fns) for Process 9824: {'count sheep': 'C:\\Local\\Testing\\bbjm843t-9824\\tmpt0wohpp1-9824.txt', 'count wolves': 'C:\\Local\\Testing\\bbjm843t-9824\\tmpu4dsoax0-9824.txt'}
Process 9824 should now os.remove the filename C:\Local\Testing\bbjm843t-9824\tmpt0wohpp1-9824.txt
Process 9824 should now os.remove the filename C:\Local\Testing\bbjm843t-9824\tmpu4dsoax0-9824.txt
Process 9824 should now os.close the file handle 4
Process 9824 should now os.close the file handle 5
Side note, it's apparently normal for the file handles to start at 4, 1-3 are reserved for system input, output, and error stuff.
Re-ordering the cleanup code in repeat_report
seems to have fixed the problem. I still need to verify that this doesn't screw with the outputs somehow though, I'm somewhat concerned about that based on the original error messages.
for key, value in fns.items():
[...]
os.remove(value)
# cleanup temp files and folders
for fh in fhs:
os.close(fh)
os.rmdir(tempfolder)
becomes:
# cleanup temp files and folders
for fh in fhs:
os.close(fh) #free up file handle for re-use
for key, value in fns.items():
os.remove(value) #delete file by name
os.rmdir(tempfolder)
Strangely, you need to remove the files before removing the entire directory.
I am wondering whether the shift to tempfiles was such a good idea given that we already have a temfolder.
Yes I noticed that was a stated limitation of rmdir
. There is apparently https://docs.python.org/3/library/shutil.html#shutil.rmtree which does remove directories with files.
Side note, the results look untouched as far as I can tell, so that's good. I can submit a pull request, do you prefer the reordering solution, or switching to shutil.rmtree
? Either way, since this is OS-level file interaction, it should probably all be tested on Unix as well, which I can't do.
I am OK with either solution, although I slightly prefer shutil.rmtree
.
I can test under MacOS, and based on experience so far, if that works, Linux works as well.
I have some EMA+pynetlogo code that runs on my desktop in a somewhat older environment, but not on my laptop in a newer environment. Testing various example models, the pynetlogo SALib demo (https://github.com/quaquel/pyNetLogo/blob/master/examples/predprey_salib_multiprocessing.py) works only on the desktop, but https://github.com/quaquel/EMAworkbench/blob/master/ema_workbench/examples/example_netlogo.py works on both.
Exemplary traceback from the laptop for the pynetlogo SALib demo:
My suspicion is that something related to working directories has changed between package versions, but I might be completely off. I looked at the changelogs for pynetlogo and EMA but didn't spot anything that seemed relevant. Does this problem ring a bell?
Desktop, Win10 Enterprise ema-workbench 2.2.0 pypi_0 pypi multiprocess 0.70.13 pypi_0 pypi pynetlogo 0.4.2 pypi_0 pypi python 3.10.6 h9a09f29_0_cpython conda-forge
Laptop, Win10 Enterprise ema-workbench 2.3.0 pypi_0 pypi multiprocess 0.70.14 pypi_0 pypi pynetlogo 0.5.1 pypi_0 pypi python 3.9.16 h6244533_2