replicahq / doppelganger

A Python package of tools to support population synthesizers
Apache License 2.0
165 stars 32 forks source link

Marginal vehicles #41

Closed nikisix closed 6 years ago

nikisix commented 6 years ago

Use the number of vehicles a household has as an additional marginal control to the solver for increased accuracy around transport variables.


This change is Reviewable

katbusch commented 6 years ago

Thanks @nikisix ! A few comments / questions


Review status: 0 of 6 files reviewed at latest revision, 4 unresolved discussions.


doppelganger/allocation.py, line 146 at r1 (raw file):

        allocation_inputs = [inputs.NUM_PEOPLE, inputs.NUM_VEHICLES]  # Hard-coded for now
        # Prepend column name to bin name to prevent bin collision
        hh_columns = list()

hh_columns = [] is more pythonic


doppelganger/allocation.py, line 147 at r1 (raw file):

        # Prepend column name to bin name to prevent bin collision
        hh_columns = list()
        for i in allocation_inputs:

i usually means an integer index. If it's not an integer it should have a different name that explains what it is


doppelganger/allocation.py, line 196 at r1 (raw file):

    def _format_data(households_data, persons_data):
        hh_size = pandas.get_dummies(households_data[inputs.NUM_PEOPLE.name])
        # Prepend column name to bin name to prevent bin collision

Can you create a helper for this so that you don't have to have the same code here and above?


doppelganger/inputs.py, line 86 at r1 (raw file):

def num_people_discrete(num_people):
    if is_blank(num_people) or int(num_people) == 0:  # match acs (can't have 0-person household)
        return '0'  # UNKNOWN causing problems in the bayes-net segmenter

Can you explain what problems this causes?


Comments from Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.7%) to 79.293% when pulling 45187029da747ff74a0c8a68f84daa7d998c1195 on marginal-vehicles into f3e2e60b119dec85c738207741ce22a8e5583229 on master.

nikisix commented 6 years ago

Review status: 0 of 6 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


doppelganger/allocation.py, line 146 at r1 (raw file):

Previously, katbusch (Kat Busch) wrote…
`hh_columns = []` is more pythonic

Done.


doppelganger/allocation.py, line 147 at r1 (raw file):

Previously, katbusch (Kat Busch) wrote…
`i` usually means an integer index. If it's not an integer it should have a different name that explains what it is

It's true.

if you_like(a_input): Done.


doppelganger/allocation.py, line 196 at r1 (raw file):

Previously, katbusch (Kat Busch) wrote…
Can you create a helper for this so that you don't have to have the same code here and above?

Done.


doppelganger/inputs.py, line 86 at r1 (raw file):

Previously, katbusch (Kat Busch) wrote…
Can you explain what problems this causes?

The bin-values of the variables we're segmenting on become json keys and are run through a sorter before being written out. The None type throws a sorting error.


Comments from Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.6%) to 79.419% when pulling df4e51d7a5b1f707f304915ff80a54d7b42710d7 on marginal-vehicles into f3e2e60b119dec85c738207741ce22a8e5583229 on master.

nikisix commented 6 years ago

Maybe we can give coveralls another shot today so I can figure out what needs tests around it.


Review status: 0 of 7 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


Comments from Reviewable

katbusch commented 6 years ago

Review status: 0 of 7 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


doppelganger/allocation.py, line 148 at r3 (raw file):

        hh_columns = []
        for a_input in allocation_inputs:
            hh_columns += ['_'.join([a_input.name, val]) for val in list(a_input.possible_values)]

can you use _str_broadcast here too?


doppelganger/inputs.py, line 86 at r1 (raw file):

Previously, nikisix (niki six) wrote…
The bin-values of the variables we're segmenting on become json keys and are run through a sorter before being written out. The None type throws a sorting error.

So the problem is in BayesianNetworkModel.to_json? Can you please file an issue with an exception trace?


test/test_allocation.py, line 32 at r3 (raw file):

        households.to_csv.assert_called_once_with('households_file')

    def test__str_broadcast(self):

Thanks for adding a test! We generally don't test private methods directly. Can you rework this test to use the public-facing API?


Comments from Reviewable

nikisix commented 6 years ago

Review status: 0 of 7 files reviewed at latest revision, 6 unresolved discussions.


test/test_allocation.py, line 32 at r3 (raw file):

Previously, katbusch (Kat Busch) wrote…
Thanks for adding a test! We generally don't test private methods directly. Can you rework this test to use the public-facing API?

Wish I would've seen this. Yea I'll switch it up.


Comments from Reviewable

nikisix commented 6 years ago

Review status: 0 of 7 files reviewed at latest revision, 6 unresolved discussions.


doppelganger/allocation.py, line 148 at r3 (raw file):

Previously, katbusch (Kat Busch) wrote…
can you use `_str_broadcast` here too?

Lol yea, meant to do that.


Comments from Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.5%) to 79.545% when pulling 93034df21084805f7da1fc25023df83a29183a62 on marginal-vehicles into f3e2e60b119dec85c738207741ce22a8e5583229 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.5%) to 79.545% when pulling b06adf06dcac42ca217f39c54990ae7a796ebacc on marginal-vehicles into f3e2e60b119dec85c738207741ce22a8e5583229 on master.

nikisix commented 6 years ago

Review status: 0 of 7 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


doppelganger/inputs.py, line 86 at r1 (raw file):

Previously, katbusch (Kat Busch) wrote…
So the problem is in `BayesianNetworkModel.to_json`? Can you please file an issue with an exception trace?

Filed. https://github.com/sidewalklabs/doppelganger/issues/42


Comments from Reviewable

katbusch commented 6 years ago

Great, good to go once the test is fixed but I have a question below


Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


doppelganger/inputs.py, line 86 at r1 (raw file):

Previously, nikisix (niki six) wrote…
Filed. https://github.com/sidewalklabs/doppelganger/issues/42

Thank you!


doppelganger/inputs.py, line 86 at r5 (raw file):

def num_people_discrete(num_people):
    if is_blank(num_people):
        return UNKNOWN

Do you need to change this for it to work? If it needs to be the other way for now you can leave it like that but add a link to the issue with a TODO


Comments from Reviewable

nikisix commented 6 years ago

Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


doppelganger/inputs.py, line 86 at r5 (raw file):

Previously, katbusch (Kat Busch) wrote…
Do you need to change this for it to work? If it needs to be the other way for now you can leave it like that but add a link to the issue with a TODO

I reverted it for now, so it won't break on a puma that contains null household numbers. You'll see it with the test push coming up.


Comments from Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage increased (+5.8%) to 85.75% when pulling f7590d0d6fd26756c7cead38c35bbd1ad87a2614 on marginal-vehicles into f3e2e60b119dec85c738207741ce22a8e5583229 on master.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+5.8%) to 85.75% when pulling f7590d0d6fd26756c7cead38c35bbd1ad87a2614 on marginal-vehicles into f3e2e60b119dec85c738207741ce22a8e5583229 on master.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+5.8%) to 85.75% when pulling f7590d0d6fd26756c7cead38c35bbd1ad87a2614 on marginal-vehicles into f3e2e60b119dec85c738207741ce22a8e5583229 on master.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+5.8%) to 85.75% when pulling 705461447111beefdc1850d85070afdf5c0ddf51 on marginal-vehicles into f3e2e60b119dec85c738207741ce22a8e5583229 on master.

katbusch commented 6 years ago
:lgtm:

Just one suggestion for future


Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions.


doppelganger/inputs.py, line 86 at r5 (raw file):

Previously, nikisix (niki six) wrote…
I reverted it for now, so it won't break on a puma that contains null household numbers. You'll see it with the test push coming up.

Can you link to the issue with and add a TODO to fix?


test/test_allocation.py, line 49 at r7 (raw file):


    def _mock_tract_data(self):
        return pandas.DataFrame([

In the future, we should try to make our test cases the minimal they need to be for the case we're testing. This looks a bit large and we probably could have done several smaller test cases for different edge cases instead


Comments from Reviewable

nikisix commented 6 years ago

Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


test/test_allocation.py, line 49 at r7 (raw file):

Previously, katbusch (Kat Busch) wrote…
In the future, we should try to make our test cases the minimal they need to be for the case we're testing. This looks a bit large and we probably could have done several smaller test cases for different edge cases instead

Definitely, the tricky part here is that most of the major machinery in allocation.py are private methods (_allocate_households and _format_data), and our goal was to test these methods using the public api if I recall.


Comments from Reviewable