replicahq / doppelganger

A Python package of tools to support population synthesizers
Apache License 2.0
165 stars 32 forks source link

Error in doppelganger_example_simple.ipynb #52

Closed kozlac closed 6 years ago

kozlac commented 6 years ago

Hello, while going over the simple example notebook in the examples folder I am getting the following error: Missing data field state After a few minutes digging through the code, I realized that the state field is part of the list allocation.DEFAULT_HOUSEHOLD_FIELDS. However, the households_00106_dirty.csv file does not contain a state column and thus when creating the households_data object it throws the exception.

Am I doing something wrong or the csv file needs to be modified? The full traceback is below. Thanks!

Missing data field state
    KeyError Traceback (most recent call last)
    <ipython-input-24-5e4c8b217b33> in <module>()
    1 households_data = PumsData.from_csv('sample_data/households_00106_dirty.csv').clean(
    ----> 2     household_fields, preprocessor, puma=PUMA
    3 )
    4 
    5 persons_fields = tuple(set(
    /usr/local/lib/python2.7/dist-packages/doppelganger/datasource.pyc in clean(self, field_names, preprocessor, state, puma)
     30         if puma is not None:
     31             cleaned_data = cleaned_data[
---> 32                     (cleaned_data[inputs.STATE.name].astype(str) == str(state)) &
     33                     (cleaned_data[inputs.PUMA.name].astype(str) == str(puma))
     34                 ]

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: u'state'`
anthonylouisburns commented 6 years ago

was able to get passed this error by adding column ,st in both files persons.csv and households.csv, than just added ,NY to the end of every line

I than got past this and got a different error

katbusch commented 6 years ago

Sorry for the delay here and thanks for reporting. I will be investing this issue and getting the repo & example in better shape in the next two weeks. I appreciate your patience!

anthonylouisburns commented 6 years ago

willing to help, keep us posted

katbusch commented 6 years ago

@anthonylouisburns the tests are passing now so if you want to write a pull request that would very welcome

anthonylouisburns commented 6 years ago

if their are issues that are well defined enough to be tasks, labeling them as a task or something similar would be useful

katbusch commented 6 years ago

This is fixed