Closed tbeggs-econo closed 1 year ago
Would like @simsong to review as well. I'll give this a try in a new Conda environment and see how it goes.
I will test tonight.
Sorry had this ready. Adding in setup.py that I am testing basic install of mysql
@tbeggs-econo
Really odd behavior I can't quite figure out. If I use my personal conda environment to run things (it's Python 3.7), then s1
runs fine. If I use at Python 3.8 environment with your makefile, I suddenly get NameError: name 'args' is not defined
whenever args is referenced. Curiously, inside main, args appears perfectly well formed. Error shown below, with the args
Namespace printed out.
(recon) rrod@dontdeletethis recon % ./dbrtool.py --reident test --step1 --stusab wy Step 1 - s1_make_geo_files.py
$ /Users/rrod/opt/anaconda3/envs/recon/bin/python3 s1_make_geo_files.py --j1 50 --config /Users/rrod/repos/recon_replication/recon/config.ini wy Namespace(config='/Users/rrod/repos/recon_replication/recon/config.ini', csv=True, force=False, j1=50, latin1=False, logfilename=None, loglevel='WARNING', logmem=False, nocsv=False, reident=None, showcounties=False, stdout=False, stusab=['wy']) make_county_list(wy) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/rrod/opt/anaconda3/envs/recon/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/Users/rrod/opt/anaconda3/envs/recon/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/Users/rrod/repos/recon_replication/recon/s1_make_geo_files.py", line 70, in make_county_list if args.latin1: NameError: name 'args' is not defined """
@tbeggs-econo The issue, for me at least, appears to be with multiprocessing. Replacing the call to p.map()
with a single call to make_county_list()
makes things work. It's possible this is something amiss with my setup; will keep digging.
I suspect it may be a path issue?
On May 1, 2023, at 7:39 PM, Rolando Rodríguez @.***> wrote:
@tbeggs-econo https://github.com/tbeggs-econo The issue, for me at least, appears to be with multiprocessing. Replacing the call to p.map() with a single call to make_county_list() makes things work. It's possible this is something amiss with my setup; will keep digging.
— Reply to this email directly, view it on GitHub https://github.com/uscensusbureau/recon_replication/pull/24#issuecomment-1530577639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMFHLFJ6ZE5G3TAV26JIFLXEBCSDANCNFSM6AAAAAAXR2LWY4. You are receiving this because you were mentioned.
Turns out they changed the default start method from fork
to spawn
on Mac when they moved from 3.7 to 3.8. If I add multiprocessing.set_start_method('fork')
to the start of the "main" loop, then everything works.
@tbeggs-econo I tested out adding multiprocessing.set_start_method('fork')
to s1_
and s2_
and it all works now. Given that fork
is still the default in Linux, I think we're ok for submission (unless @simsong or @larsvilhuber object), and we can work on adding and testing using fork
for Mac users over the next few days.
I’m not familiar with this. It’s odd that it’s suddenly needed. But yes, this code has only been tested with fork
.
Here's what's going on: https://docs.python.org/3/library/multiprocessing.html
The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
args
is not passed to the worker function. So it's dynamically scoped, but not lexically scoped. That's fine work fork
. But with spawn
, it's not available in the child process.
Just a note: you do not have to make this run on every conceivable system on Earth. Lock it into "fork" and that's it. State that it was run on Python 3.7, and lock in package versions for THAT version of Python only.
If you want to upgrade it to run on Python 4.99 at a later stage - that's what versioning and multiple releases are for. Make it an "issue", to be solved in the next releases/milestone.
-- Lars Vilhuber, Economist Cornell University p: +1.607-330-5743 https://calendly.com/larsvilhuber My working day may not be your working day. Please respond during your working day.
From: Simson L. Garfinkel @.> Sent: Monday, May 1, 2023 20:30 To: uscensusbureau/recon_replication @.> Cc: Lars Vilhuber @.>; Mention @.> Subject: Re: [uscensusbureau/recon_replication] Makefile (PR #24)
I’m not familiar with this. It’s odd that it’s suddenly needed. But yes, this code has only been tested with fork.
Here's what's going on: https://docs.python.org/3/library/multiprocessing.html
The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run()https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.run method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
args is not passed to the worker function. So it's dynamically scoped, but not lexically scoped. That's fine work fork. But with spawn, it's not available in the child process.
— Reply to this email directly, view it on GitHubhttps://github.com/uscensusbureau/recon_replication/pull/24#issuecomment-1530675098, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABVSQ6HXVMMTD74MPL6ZBMDXEBIQ5ANCNFSM6AAAAAAXR2LWY4. You are receiving this because you were mentioned.Message ID: @.***>
I concur.
Message ID: @.***>
@tbeggs-econo has been testing on 3.8, so we'll go with that, and say that running on Mac with versions greater than 3.7 is a todo later. Thanks!
Create makefile to install dependencies and run pylint and pytest
Closes: #17 Closes: #22