Closed HLC-CUUATS closed 6 years ago
Update: We have figured out my issue with editing the inputs file (had to delete my current and reinstall from the specific location I was working with).
We could still use some help on how to modify inputs.py in order for more discrete values in household_income and num_people.
Glad to hear you're using doppelganger!
To modify the discretization of num_people
you can modify this function:
https://github.com/sidewalklabs/doppelganger/blob/master/doppelganger/inputs.py#L84
You should be able to modify the discretization of individual_income
or household_income
just by changing the bins in your config: https://github.com/sidewalklabs/doppelganger/blob/master/examples/sample_data/config.json
Let me know if this helps!
@katbusch Thanks for the reply!
We've been modifying our income range by changing the bins in our config; we made our income increment by 10000. The issue we've been having with this is that a good amount of our genereated household output data is <=0 (~37k/78k). Have you seen examples of this problem before?
It is likely that your input data does not have samples of household incomes within the ranges of all the bins you requested, i.e. with 10k increments a good amount of the bins may simply got no input data, eventually causing this kind of output.
@alexeisw I believe they're referring to too many 0-income households.
@HLC-CUUATS, is the % of households generated different from the % in the training data? I believe one current issue with Doppelganger is that currently if the training data is missing income information for a household, that will be counted as a zero-income household. So I would expect that the % of 0s should be the % of zeros in your training data + the % of rows with no data in your training data.
@katbusch I double checked the input data we were using and that seems to be the problem. We have a good amount of missing incomes, proportional to the amount of 0's we are getting from doppelganger. We will most likely use ACS Median values based off of tract and households size for the missing numbers. Thanks for your help!
Hi everyone,
We are currently using doppelganger for our own set of data in our region. The example is working for us when we use our own data and the generated household table is exactly what we need. The only problem is that we do need discrete numbers for some of the categories, in our case it would be household_income and num_people (some of the values are categorical but we would need specific numbers).
We have downloaded the most recent version of doppelganger and been using it via Jupyter Notebook. In the doppelganger full example it mentions accessing inputs.py to make adjustments to output variables. After modifying the inputs.py file and running the example we noticed the outputs do not change at all. Are we suppose to modify the inputs.py file within the download we have or is there another inputs.py that we should be working with?
To clarify, we have our doppelganger location at 'C:\Users\someUser\doppelgangerCU' and we've been modifying the inputs at 'C:\Users\someUser\doppelgangerCU\doppelganger\inputs.py'.
We'd appreciate any help, thanks!