Closed nikisix closed 6 years ago
That'd be really useful to have some scripts, thanks for contributing, @nikisix !!
The set of functions that scripts use should be properly documented too. The goal here is to enable reuse of the functions in multiple different scripts out of the context of the arguments and code in main().
Functions docstrings should also be detailed enough for anyone to use them without going into the actual package code to figure out what are the datatypes of all of the multiple inputs.
BTW, CircleCI builds fail on syntax, please investigate: ./scripts/download_allocate_generate.py:90:44: E999 SyntaxError: invalid syntax ./scripts/fetch_puma_data_from_db.py:81:41: F821 undefined name 'unicode'
Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, some commit checks failed.
Comments from Reviewable
+1 on documentation
We use the google python style guide for our commenting style: https://google.github.io/styleguide/pyguide.html?showone=Comments#Comments
I'm also concerned that none of this is tested & we won't have a way of knowing if we break it
Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.
scripts/download_allocate_generate.py, line 51 at r1 (raw file):
help='hostname of database with pums data', default='localhost') parser.add_argument('--db_database', type=unicode, help='db name') parser.add_argument('--db_schema', type=unicode, help='db schema', default='import')
I think you need to explain in more detail what this means since it's different for postgres vs mysql & could be confusing for some
scripts/download_allocate_generate.py, line 140 at r1 (raw file):
# Write the household bayes net to disk def household_segmenter(x): x[inputs.NUM_PEOPLE.name]
A lambda would be nice here
scripts/fetch_puma_data_from_db.py, line 19 at r1 (raw file):
def fetch_data(output_dir, state_id, puma_id, db_host, db_database, db_schema, db_user, db_password,
I don't think it's clear here that this write the data to a file. I recommend doing the step of fetching & writing separately. This can return a PumsData & the user can decide whether to write it out
Comments from Reviewable
Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.
scripts/download_allocate_generate.py, line 140 at r1 (raw file):
https://google.github.io/styleguide/pyguide.html?showone=Comments#Comments Had a lambda here for the segmenter function and the linter (flake8) made me turn it into a standard def-function
Comments from Reviewable
scripts/download_allocate_generate.py, line 140 at r1 (raw file):
Ups, please ignore the link there :)
Comments from Reviewable
Oh I forgot to mention: all of our files should have these lines at the top for python 3 compatibility:
from __future__ import (
absolute_import, division, print_function, unicode_literals
)
Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.
Comments from Reviewable
Review status: 0 of 13 files reviewed at latest revision, 3 unresolved discussions.
scripts/download_allocate_generate.py, line 51 at r1 (raw file):
I think you need to explain in more detail what this means since it's different for postgres vs mysql & could be confusing for some
Good point, made it clear that the database in question is postgres.
scripts/fetch_puma_data_from_db.py, line 19 at r1 (raw file):
I don't think it's clear here that this write the data to a file. I recommend doing the step of fetching & writing separately. This can return a PumsData & the user can decide whether to write it out
Agreed. Done.
Comments from Reviewable
Awesome, looking good! Thanks for adding all those tests. Don't worry about the coverage decrease. It's just because now the coverage tool knows our scripts exist :)
A couple small things remaining
Review status: 0 of 14 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.
setup.py, line 22 at r9 (raw file):
'six>=1.10.0', 'future>=0.16.0', # Circle-Ci AttributeError: module 'enum' has no attribute 'IntFlag'
Need to use enum34, not enum. Not sure why that wasn't there!
setup.sh, line 1 at r9 (raw file):
pip install numpy
Thanks for making this setup script!
setup.sh, line 5 at r9 (raw file):
pip uninstall -y pomegranate pip install pomegranate==0.7.1 --no-cache-dir pip install enum>=0.4.6
remove this line.
doppelganger/marginals.py, line 105 at r9 (raw file):
census_key (unicode): census API key state (unicode): state fips code
What is fips? Please write out what that is for the uninformed like me :)
doppelganger/marginals.py, line 106 at r9 (raw file):
census_key (unicode): census API key state (unicode): state fips code pumas (iterable of unicode): a single puma or a list of pumas to fetch for. If the
If it's a single PUMA, it should just be a list of one item. Not need to special case this
test/test_download_allocate_generate.py, line 219 at r9 (raw file):
@mock.patch('builtins.open') # python 3 compatible # @mock.patch('__builtin__.open') # python 2 compatible
Remove unused line
scripts/download_allocate_generate.py, line 51 at r1 (raw file):
Good point, made it clear that the database in question is postgres.
I don't think it's clear. Can you change the help text for db schema to give a small explanation?
Comments from Reviewable
Review status: 0 of 14 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.
setup.sh, line 5 at r9 (raw file):
remove this line.
I actually stated out with it back in the setup.py, but that wasn't working so I moved it out and to the end, seems to work here. Changing to enum34.
scripts/download_allocate_generate.py, line 51 at r1 (raw file):
I don't think it's clear. Can you change the help text for db schema to give a small explanation?
Absolutely
Comments from Reviewable
Review status: 0 of 14 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.
setup.sh, line 1 at r9 (raw file):
Thanks for making this setup script!
:)
Comments from Reviewable
Yay! Feel free to merge yourself once the tests pass
Review status: 0 of 14 files reviewed at latest revision, 9 unresolved discussions.
Comments from Reviewable
This change is![Reviewable](https://reviewable.io/review_button.svg)