uscensusbureau / recon_replication

Other
2 stars 0 forks source link

Makefile #24

Closed tbeggs-econo closed 1 year ago

tbeggs-econo commented 1 year ago

Create makefile to install dependencies and run pylint and pytest

Closes: #17 Closes: #22

rrod515 commented 1 year ago

Would like @simsong to review as well. I'll give this a try in a new Conda environment and see how it goes.

simsong commented 1 year ago

I will test tonight.

tbeggs-econo commented 1 year ago

Sorry had this ready. Adding in setup.py that I am testing basic install of mysql

rrod515 commented 1 year ago

@tbeggs-econo

Really odd behavior I can't quite figure out. If I use my personal conda environment to run things (it's Python 3.7), then s1 runs fine. If I use at Python 3.8 environment with your makefile, I suddenly get NameError: name 'args' is not defined whenever args is referenced. Curiously, inside main, args appears perfectly well formed. Error shown below, with the args Namespace printed out.

(recon) rrod@dontdeletethis recon % ./dbrtool.py --reident test --step1 --stusab wy Step 1 - s1_make_geo_files.py

$ /Users/rrod/opt/anaconda3/envs/recon/bin/python3 s1_make_geo_files.py --j1 50 --config /Users/rrod/repos/recon_replication/recon/config.ini wy Namespace(config='/Users/rrod/repos/recon_replication/recon/config.ini', csv=True, force=False, j1=50, latin1=False, logfilename=None, loglevel='WARNING', logmem=False, nocsv=False, reident=None, showcounties=False, stdout=False, stusab=['wy']) make_county_list(wy) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/rrod/opt/anaconda3/envs/recon/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/Users/rrod/opt/anaconda3/envs/recon/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/Users/rrod/repos/recon_replication/recon/s1_make_geo_files.py", line 70, in make_county_list if args.latin1: NameError: name 'args' is not defined """

rrod515 commented 1 year ago

@tbeggs-econo The issue, for me at least, appears to be with multiprocessing. Replacing the call to p.map() with a single call to make_county_list() makes things work. It's possible this is something amiss with my setup; will keep digging.

simsong commented 1 year ago

I suspect it may be a path issue?

On May 1, 2023, at 7:39 PM, Rolando Rodríguez @.***> wrote:

@tbeggs-econo https://github.com/tbeggs-econo The issue, for me at least, appears to be with multiprocessing. Replacing the call to p.map() with a single call to make_county_list() makes things work. It's possible this is something amiss with my setup; will keep digging.

— Reply to this email directly, view it on GitHub https://github.com/uscensusbureau/recon_replication/pull/24#issuecomment-1530577639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMFHLFJ6ZE5G3TAV26JIFLXEBCSDANCNFSM6AAAAAAXR2LWY4. You are receiving this because you were mentioned.

rrod515 commented 1 year ago

Turns out they changed the default start method from fork to spawn on Mac when they moved from 3.7 to 3.8. If I add multiprocessing.set_start_method('fork') to the start of the "main" loop, then everything works.

rrod515 commented 1 year ago

@tbeggs-econo I tested out adding multiprocessing.set_start_method('fork') to s1_ and s2_ and it all works now. Given that fork is still the default in Linux, I think we're ok for submission (unless @simsong or @larsvilhuber object), and we can work on adding and testing using fork for Mac users over the next few days.

simsong commented 1 year ago

I’m not familiar with this. It’s odd that it’s suddenly needed. But yes, this code has only been tested with fork.

Here's what's going on: https://docs.python.org/3/library/multiprocessing.html

The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

args is not passed to the worker function. So it's dynamically scoped, but not lexically scoped. That's fine work fork. But with spawn, it's not available in the child process.

larsvilhuber commented 1 year ago

Just a note: you do not have to make this run on every conceivable system on Earth. Lock it into "fork" and that's it. State that it was run on Python 3.7, and lock in package versions for THAT version of Python only.

If you want​ to upgrade it to run on Python 4.99 at a later stage - that's what versioning and multiple releases are for. Make it an "issue", to be solved in the next releases/milestone.

-- Lars Vilhuber, Economist Cornell University p: +1.607-330-5743 https://calendly.com/larsvilhuber My working day may not be your working day. Please respond during your working day.


From: Simson L. Garfinkel @.> Sent: Monday, May 1, 2023 20:30 To: uscensusbureau/recon_replication @.> Cc: Lars Vilhuber @.>; Mention @.> Subject: Re: [uscensusbureau/recon_replication] Makefile (PR #24)

I’m not familiar with this. It’s odd that it’s suddenly needed. But yes, this code has only been tested with fork.

Here's what's going on: https://docs.python.org/3/library/multiprocessing.html

The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run()https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.run method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

args is not passed to the worker function. So it's dynamically scoped, but not lexically scoped. That's fine work fork. But with spawn, it's not available in the child process.

— Reply to this email directly, view it on GitHubhttps://github.com/uscensusbureau/recon_replication/pull/24#issuecomment-1530675098, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABVSQ6HXVMMTD74MPL6ZBMDXEBIQ5ANCNFSM6AAAAAAXR2LWY4. You are receiving this because you were mentioned.Message ID: @.***>

simsong commented 1 year ago

I concur.

Message ID: @.***>

rrod515 commented 1 year ago

@tbeggs-econo has been testing on 3.8, so we'll go with that, and say that running on Mac with versions greater than 3.7 is a todo later. Thanks!