replicahq / doppelganger

A Python package of tools to support population synthesizers
Apache License 2.0
165 stars 32 forks source link

Generated household repeat_index incorrect #1

Closed kaelgreco closed 7 years ago

kaelgreco commented 7 years ago

When running the example notebook, Doppelganger.ipynb, the population output in step 03 seems incorrect.

Pandas DataFrame for first 5 people:

         tract  serial_number  repeat_index    age sex individual_income
138842  422209           4431             0    65+   M               <=0
138843  422209           4431             1    65+   M               <=0
138897  422209           4431             0  35-64   F               <=0
138898  422209           4431             1  35-64   F           100000+
54123   422209          12930             0  35-64   M       40000-80000

Pandas DataFrame for first 5 households:

         tract  serial_number  repeat_index num_people household_income  \
0       422209           4431             0          2          <=40000   
1       422209           4431             1          2           40000+   
90603   422209           4431             0          2           40000+   
90604   422209           4431             1          2           40000+   
181206  422209           4431             0          2           40000+   

       num_vehicles  
0               1.0  
1               1.0  
90603           2.0  
90604           2.0  
181206          2.0 

I expected to see sequential non-duplicate repeat indices for tract, serial_number pairs in households, e.g. the repeat indices column would be 0,1,2,3,4