replicahq / doppelganger

A Python package of tools to support population synthesizers
Apache License 2.0
165 stars 32 forks source link

Adding state filter because pumas are only unique within a state. The ... #33

Closed nikisix closed 6 years ago

nikisix commented 6 years ago

… example_full jupyter notebook has been updated accordingly. Simple example uses pre-canned data -- which should be replaced because puma-00106 isn't unique to California. Adding line in gitignore to ignore the tract-puma relations file, so the unzipped version isn't accidentally added to this repo in the future.


This change is Reviewable

alexeisw commented 6 years ago

This will be a relevant fix, thanks!

I've got minor comments and then I'd bring it to the attention of @kaelgreco to review new marginals fetching function.

BTW, two tests fail on this branch:


Traceback (most recent call last): File "/home/ubuntu/doppelganger/test/test_marginals.py", line 59, in test_fetch_marginals self._mock_marginals_file(), set([puma])) File "/home/ubuntu/doppelganger/doppelganger/marginals.py", line 115, in from_census_data https://www.mcc.co.mercer.pa.us/dps/state_fips_code_listing.htm''') ValueError: Please supply a state fips code and a puma. https://www.mcc.co.mercer.pa.us/dps/state_fips_code_listing.htm


Traceback (most recent call last): File "/home/ubuntu/doppelganger/test/test_bayes_net.py", line 201, in test_load_bayes_net self.assertSequenceEqual(structure, ((), (0,), (0, 1))) AssertionError: Sequences differ: ((), (0,), (1, 0)) != ((), (0,), (0, 1))

First differing element 2: (1, 0) (0, 1)


Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


.gitignore, line 131 at r1 (raw file):

*.omx

# ActivitySim files

change to: Data files


doppelganger/marginals.py, line 92 at r1 (raw file):


    @staticmethod
    def from_census_data(puma_tract_mappings, census_key, state=None, puma=None):

This function was pulling marginals for multiple pumas, and is now changed to only fetch one at a time? @kaelgreco


doppelganger/marginals.py, line 115 at r1 (raw file):

        if not puma or not state:
            raise ValueError('''Please supply a state fips code and a puma.
                    https://www.mcc.co.mercer.pa.us/dps/state_fips_code_listing.htm''')

I'd suggest changing info source to https://www.census.gov/geo/reference/ansi_statetables.html

(no disrespect to the website of Mercer County, Pennsylvania)


Comments from Reviewable

nikisix commented 6 years ago

Thanks for the suggestions. I'll make the mentioned changes. I bet the tests need updated...

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.1%) to 82.278% when pulling c106a137456022048b9044c3a37e0bb193ab160c on filter-on-state into 8b6ab596bebc1331e4729f995210598b34a3d31f on master.

kaelgreco commented 6 years ago

doppelganger/marginals.py, line 92 at r1 (raw file):

Previously, alexeisw wrote…
This function was pulling marginals for multiple pumas, and is now changed to only fetch one at a time? @kaelgreco

I think the most likely use case will be for someone running this for a number of pumas for a specific state. is there a reason we can't keep it as list of puma ids with a state id?


Comments from Reviewable

nikisix commented 6 years ago

Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions.


doppelganger/marginals.py, line 92 at r1 (raw file):

Previously, kaelgreco wrote…
I think the most likely use case will be for someone running this for a number of pumas for a specific state. is there a reason we can't keep it as list of puma ids with a state id?

Good point, reverted.


Comments from Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.1%) to 82.278% when pulling 1256128a885ba9efcdd57d42a9412adfe5f672bd on filter-on-state into 8b6ab596bebc1331e4729f995210598b34a3d31f on master.

kaelgreco commented 6 years ago

Cool! :lgtm:


Reviewed 2 of 4 files at r1, 1 of 3 files at r2, 2 of 2 files at r3. Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Comments from Reviewable