replicahq / doppelganger

A Python package of tools to support population synthesizers
Apache License 2.0
165 stars 32 forks source link

Length mismatch #68

Open aliarian opened 6 years ago

aliarian commented 6 years ago

Hi

I'm using the sample data and trying to run the example provided in doppelganger_example_full.ipynb. However, I get this error and cannot figure out what's the problem. Can you please help me with it?

allocator = HouseholdAllocator.from_cleaned_data(controls, households_data, persons_data)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/doppelganger/allocation.py", line 77, in from_cleaned_data
    households_data.data, persons_data.data)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/doppelganger/allocation.py", line 212, in _format_data
    ._str_broadcast(inputs.AGE.name, list(inputs.AGE.possible_values))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 4385, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 645, in _set_axis
    self._data.set_axis(axis, labels)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 3323, in set_axis
    'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 0 elements, new values have 4 elements
darebrawley commented 5 years ago

Hi, I'm wondering whether there has been an update to this? I'm getting a similar error message when working through the doppelganger_example_full.ipynb (though I am running python 3.6):

allocator = HouseholdAllocator.from_cleaned_data(controls, households_data, persons_data)

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in () ----> 1 allocator = HouseholdAllocator.from_cleaned_data(controls, households_data, persons_data) ~/anaconda3/lib/python3.6/site-packages/doppelganger/allocation.py in from_cleaned_data(marginals, households_data, persons_data) 75 76 households, persons = HouseholdAllocator._format_data( ---> 77 households_data.data, persons_data.data) 78 allocated_households, allocated_persons = \ 79 HouseholdAllocator._allocate_households(households, persons, marginals) ~/anaconda3/lib/python3.6/site-packages/doppelganger/allocation.py in _format_data(households_data, persons_data) 210 hp_ages = pandas.get_dummies(persons_data[inputs.AGE.name]) 211 hp_ages.columns = HouseholdAllocator\ --> 212 ._str_broadcast(inputs.AGE.name, list(inputs.AGE.possible_values)) 213 persons_data = pandas.concat([persons_data, hp_ages], axis=1) 214 ~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __setattr__(self, name, value) 4387 try: 4388 object.__getattribute__(self, name) -> 4389 return object.__setattr__(self, name, value) 4390 except AttributeError: 4391 pass pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__() ~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels) 644 645 def _set_axis(self, axis, labels): --> 646 self._data.set_axis(axis, labels) 647 self._clear_item_cache() 648 ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in set_axis(self, axis, new_labels) 3321 raise ValueError( 3322 'Length mismatch: Expected axis has {old} elements, new ' -> 3323 'values have {new} elements'.format(old=old_len, new=new_len)) 3324 3325 self.axes[axis] = new_labels ValueError: Length mismatch: Expected axis has 0 elements, new values have 4 elements `
LockeBirdsey commented 5 years ago

Hi,

This seems to be caused by a version mismatch issue with cvxpy. To resolve: Make sure your notebook is using a Python 2.7 kernel If that doesn't work, downgrade cvxpy to 0.4.8 pip2 install cvxpy==0.4.8

To check which version is running, put the following (hacky) code near the top of your notebook:

import cvxpy
print(cvxpy.__version__)

EDIT: This seems to have encountered before in #57

darebrawley commented 5 years ago

Hi, @LachlanBirdsey Thanks so much for your response. I tried running it again first with a Python 2.7 Kernel and cvxpy==0.4.8 following your instructions and I'm getting the same error..... any other possible suggestions? I'm not sure I entirely understand what the issue/fix was in #57

Many thanks!

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in () ----> 1 allocator = HouseholdAllocator.from_cleaned_data(controls, households_data, persons_data) /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/doppelganger/allocation.pyc in from_cleaned_data(marginals, households_data, persons_data) 75 76 households, persons = HouseholdAllocator._format_data( ---> 77 households_data.data, persons_data.data) 78 allocated_households, allocated_persons = \ 79 HouseholdAllocator._allocate_households(households, persons, marginals) /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/doppelganger/allocation.pyc in _format_data(households_data, persons_data) 210 hp_ages = pandas.get_dummies(persons_data[inputs.AGE.name]) 211 hp_ages.columns = HouseholdAllocator\ --> 212 ._str_broadcast(inputs.AGE.name, list(inputs.AGE.possible_values)) 213 persons_data = pandas.concat([persons_data, hp_ages], axis=1) 214 /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/pandas/core/generic.pyc in __setattr__(self, name, value) 5078 try: 5079 object.__getattribute__(self, name) -> 5080 return object.__setattr__(self, name, value) 5081 except AttributeError: 5082 pass pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__() /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/pandas/core/generic.pyc in _set_axis(self, axis, labels) 636 637 def _set_axis(self, axis, labels): --> 638 self._data.set_axis(axis, labels) 639 self._clear_item_cache() 640 /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/pandas/core/internals/managers.pyc in set_axis(self, axis, new_labels) 153 raise ValueError( 154 'Length mismatch: Expected axis has {old} elements, new ' --> 155 'values have {new} elements'.format(old=old_len, new=new_len)) 156 157 self.axes[axis] = new_labels ValueError: Length mismatch: Expected axis has 0 elements, new values have 4 elements`
LockeBirdsey commented 5 years ago

Hi,

One potential fix is to remove the puma=PUMA parameter in households_data = PumsData.from_csv(...

Hopefully this helps.

darebrawley commented 5 years ago

Yes! amazing. Thank you @LachlanBirdsey I removed the puma=PUMA parameter in households_data and in persons_data and then the allocator worked.

However, when attempting to run: population = Population.generate(allocator, person_model, household_model)

in "Step 03: Replace the PUMS Persons with Synthetic Persons created from the Bayesian Network" I get the error message included below. Any chance you have any suggestions here? many thanks again.

`--------------------------------------------------------------------------- AttributeError Traceback (most recent call last)

in () 1 population = Population.generate( ----> 2 allocator, person_model, household_model 3 ) /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/doppelganger/populationgen.pyc in generate(household_allocator, person_model, household_model) 94 persons = Population._generate_from_model( 95 household_allocator, household_allocator.allocated_persons, ---> 96 person_model, [inputs.AGE.name, inputs.SEX.name], Population._extract_person_evidence 97 ) 98 households = Population._generate_from_model( /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/doppelganger/populationgen.pyc in _generate_from_model(household_allocator, data, model, fields, evidence_fn) 70 household_allocator 71 ): ---> 72 generated_rows = model.generate(segment, evidence, count=count) 73 for repeat_id, row in enumerate(generated_rows): 74 household_id = '{}-{}-{}'.format(tract, serialno, repeat_id) /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/doppelganger/bayesnets.pyc in generate(self, type_, evidence, count) 315 316 generated = tuple( --> 317 tuple(distribution.sample() for distribution in distributions) for _ in range(count) 318 ) 319 return generated /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/doppelganger/bayesnets.pyc in ((_,)) 315 316 generated = tuple( --> 317 tuple(distribution.sample() for distribution in distributions) for _ in range(count) 318 ) 319 return generated /Users/darebrawley/anaconda3/envs/panda_kernl2/lib/python2.7/site-packages/doppelganger/bayesnets.pyc in ((distribution,)) 315 316 generated = tuple( --> 317 tuple(distribution.sample() for distribution in distributions) for _ in range(count) 318 ) 319 return generated AttributeError: 'unicode' object has no attribute 'sample'`
LockeBirdsey commented 5 years ago

Hi @darebrawley,

Firstly, I would check to see which version of pomegranate you have installed. Doppelganger has a requirement of pomegranate==0.8.1. Later versions are very incompatible (see #58)