replicahq / doppelganger

A Python package of tools to support population synthesizers
Apache License 2.0
165 stars 32 forks source link

Run doppelganger on a state and puma #34

Closed nikisix closed 6 years ago

nikisix commented 6 years ago
  1. Download pums data from a database if necessary
  2. Fetch census marginal data
  3. Generate population and household data

This change is Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage remained the same at 82.278% when pulling 47b2fd512e83161a5eb3a6b8ac4436b6c3e2ccca on scripts into 94b5986284a12003e39e1dbad96e9f980cdc2803 on master.

alexeisw commented 6 years ago

That'd be really useful to have some scripts, thanks for contributing, @nikisix !!

The set of functions that scripts use should be properly documented too. The goal here is to enable reuse of the functions in multiple different scripts out of the context of the arguments and code in main().

Functions docstrings should also be detailed enough for anyone to use them without going into the actual package code to figure out what are the datatypes of all of the multiple inputs.

BTW, CircleCI builds fail on syntax, please investigate: ./scripts/download_allocate_generate.py:90:44: E999 SyntaxError: invalid syntax ./scripts/fetch_puma_data_from_db.py:81:41: F821 undefined name 'unicode'


Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

katbusch commented 6 years ago

+1 on documentation

We use the google python style guide for our commenting style: https://google.github.io/styleguide/pyguide.html?showone=Comments#Comments

I'm also concerned that none of this is tested & we won't have a way of knowing if we break it


Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


scripts/download_allocate_generate.py, line 51 at r1 (raw file):

                        help='hostname of database with pums data', default='localhost')
    parser.add_argument('--db_database', type=unicode, help='db name')
    parser.add_argument('--db_schema', type=unicode, help='db schema', default='import')

I think you need to explain in more detail what this means since it's different for postgres vs mysql & could be confusing for some


scripts/download_allocate_generate.py, line 140 at r1 (raw file):


    # Write the household bayes net to disk
    def household_segmenter(x): x[inputs.NUM_PEOPLE.name]

A lambda would be nice here


scripts/fetch_puma_data_from_db.py, line 19 at r1 (raw file):


def fetch_data(output_dir, state_id, puma_id, db_host, db_database, db_schema, db_user, db_password,

I don't think it's clear here that this write the data to a file. I recommend doing the step of fetching & writing separately. This can return a PumsData & the user can decide whether to write it out


Comments from Reviewable

nikisix commented 6 years ago

Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


scripts/download_allocate_generate.py, line 140 at r1 (raw file):

https://google.github.io/styleguide/pyguide.html?showone=Comments#Comments Had a lambda here for the segmenter function and the linter (flake8) made me turn it into a standard def-function


Comments from Reviewable

nikisix commented 6 years ago

scripts/download_allocate_generate.py, line 140 at r1 (raw file):

Ups, please ignore the link there :)


Comments from Reviewable

katbusch commented 6 years ago

Oh I forgot to mention: all of our files should have these lines at the top for python 3 compatibility:

from __future__ import (
    absolute_import, division, print_function, unicode_literals
)

Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Comments from Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-4.01%) to 78.403% when pulling 062ddcb3122dde4598eeb55ad21b0e0c2ca05e82 on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.497% when pulling b6e232364e4144feac7139f1ae58be506d5afe92 on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.497% when pulling 507571093770e83324f1aec7eb1723d7846cf0c7 on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.497% when pulling cdabbfe94026c8a21ec0068510df9ffb5c30ecfe on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.497% when pulling 7d51ec18942ca716ad1cebbf848138ca5e14369a on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

nikisix commented 6 years ago

Review status: 0 of 13 files reviewed at latest revision, 3 unresolved discussions.


scripts/download_allocate_generate.py, line 51 at r1 (raw file):

Previously, katbusch (Kat Busch) wrote…
I think you need to explain in more detail what this means since it's different for postgres vs mysql & could be confusing for some

Good point, made it clear that the database in question is postgres.


scripts/fetch_puma_data_from_db.py, line 19 at r1 (raw file):

Previously, katbusch (Kat Busch) wrote…
I don't think it's clear here that this write the data to a file. I recommend doing the step of fetching & writing separately. This can return a PumsData & the user can decide whether to write it out

Agreed. Done.


Comments from Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.497% when pulling 080cf1a754a3fd7f1755999303165fbab3a00d60 on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.523% when pulling 824c3f63c0c7bee237ea8666cbabf670ff2eff48 on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.523% when pulling ecd50a9cff37ceb5918185a04bd96a46c10bbd6d on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

katbusch commented 6 years ago

Awesome, looking good! Thanks for adding all those tests. Don't worry about the coverage decrease. It's just because now the coverage tool knows our scripts exist :)

A couple small things remaining


Review status: 0 of 14 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


setup.py, line 22 at r9 (raw file):

        'six>=1.10.0',
        'future>=0.16.0',
        # Circle-Ci AttributeError: module 'enum' has no attribute 'IntFlag'

Need to use enum34, not enum. Not sure why that wasn't there!


setup.sh, line 1 at r9 (raw file):

pip install numpy

Thanks for making this setup script!


setup.sh, line 5 at r9 (raw file):

pip uninstall -y pomegranate
pip install pomegranate==0.7.1 --no-cache-dir
pip install enum>=0.4.6

remove this line.


doppelganger/marginals.py, line 105 at r9 (raw file):


            census_key (unicode): census API key
            state (unicode): state fips code

What is fips? Please write out what that is for the uninformed like me :)


doppelganger/marginals.py, line 106 at r9 (raw file):

            census_key (unicode): census API key
            state (unicode): state fips code
            pumas (iterable of unicode): a single puma or a list of pumas to fetch for.  If the

If it's a single PUMA, it should just be a list of one item. Not need to special case this


test/test_download_allocate_generate.py, line 219 at r9 (raw file):


    @mock.patch('builtins.open')  # python 3 compatible
    # @mock.patch('__builtin__.open')  # python 2 compatible

Remove unused line


scripts/download_allocate_generate.py, line 51 at r1 (raw file):

Previously, nikisix (niki six) wrote…
Good point, made it clear that the database in question is postgres.

I don't think it's clear. Can you change the help text for db schema to give a small explanation?


Comments from Reviewable

nikisix commented 6 years ago

Review status: 0 of 14 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


setup.sh, line 5 at r9 (raw file):

Previously, katbusch (Kat Busch) wrote…
remove this line.

I actually stated out with it back in the setup.py, but that wasn't working so I moved it out and to the end, seems to work here. Changing to enum34.


scripts/download_allocate_generate.py, line 51 at r1 (raw file):

Previously, katbusch (Kat Busch) wrote…
I don't think it's clear. Can you change the help text for db schema to give a small explanation?

Absolutely


Comments from Reviewable

nikisix commented 6 years ago

Review status: 0 of 14 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


setup.sh, line 1 at r9 (raw file):

Previously, katbusch (Kat Busch) wrote…
Thanks for making this setup script!

:)


Comments from Reviewable

katbusch commented 6 years ago
:lgtm:

Yay! Feel free to merge yourself once the tests pass


Review status: 0 of 14 files reviewed at latest revision, 9 unresolved discussions.


Comments from Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-62.0%) to 20.433% when pulling 578b1d5fe1bf5c62cedc32bae37d0a000eca194f on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-62.0%) to 20.433% when pulling 9ab5b7a07b31ca2a0b6bed5283d4abb04859009c on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.548% when pulling 9dcd702717b058138db8e3d66d1ff613b96bfaf5 on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 80.548% when pulling 9dcd702717b058138db8e3d66d1ff613b96bfaf5 on scripts into 0d1d6d16bf67b1fc8a5abd8c5ce9989d0688dcd8 on master.