replicahq / doppelganger

A Python package of tools to support population synthesizers
Apache License 2.0
165 stars 32 forks source link

Household indexing is confusing #27

Closed katbusch closed 6 years ago

katbusch commented 7 years ago

Right now households are unique by the (tract, serialno, repeatno) tuple. They should just have a unique ID

katbusch commented 7 years ago

@alexeisw @DavidOry what do you think of just adding a column called household_id to both tables.
something like this:

households['household_id'] = households['tract'] + '-' + households['serial_number'] + '-' + households['repeat_index']
persons['household_id'] = persons['tract'] + '-' + persons['serial_number'] + '-' + persons['repeat_index']
alexeisw commented 7 years ago

From a user perspective it'd be great to turn households['serial_number'] + '-' + households['repeat_index'] into a count. Then households['tract'] + '-' + household['count'] would give clear idea on the index/total number of synthetic HHs within a tract.

alexeisw commented 7 years ago

For persons, it'd be great to see a readable ID of 'Tract'-'HH_in_a_tract_idx'-'Person_in_a_household_idx'