Closed katbusch closed 6 years ago
@alexeisw @DavidOry what do you think of just adding a column called household_id to both tables.
something like this:
households['household_id'] = households['tract'] + '-' + households['serial_number'] + '-' + households['repeat_index']
persons['household_id'] = persons['tract'] + '-' + persons['serial_number'] + '-' + persons['repeat_index']
From a user perspective it'd be great to turn households['serial_number'] + '-' + households['repeat_index']
into a count. Then households['tract'] + '-' + household['count'] would give clear idea on the index/total number of synthetic HHs within a tract.
For persons, it'd be great to see a readable ID of 'Tract'-'HH_in_a_tract_idx'-'Person_in_a_household_idx'
Right now households are unique by the (tract, serialno, repeatno) tuple. They should just have a unique ID