Closed nikisix closed 6 years ago
Thanks for adding these!
removed segmentation due to growing model size Can you explain exactly what you mean by this?
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.
doppelganger/inputs.py, line 103 at r1 (raw file):
return 'under-16' if code == '1' or code == '2' or code == '4' or code == '5': return 'Y'
How about something a little more descriptive than 'Y' and 'N'? 'employed' and 'unemployed'? And would be great if those were defined like:
class EmploymentStatus(object):
UNDER_16 = 'under16'
UNEMPLOYED = 'unemployed'
...
doppelganger/inputs.py, line 153 at r1 (raw file):
return 'bachelors-degree' if code == '22' or code == '23' or code == '24': return 'advanced-degree'
Same here for constant definitions
Comments from Reviewable
removed segmentation due to growing model size
Can you explain exactly what you mean by this?
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.
Comments from Reviewable
Definitely -- Sending None as my segmentation functions (person_segmenter below) to the create_bayes_net. I.e.
person_training_data = SegmentedData.from_data(
cleaned_data=persons_data,
fields=list(configuration.person_fields),
weight_field=inputs.PERSON_WEIGHT.name,
segmenter=person_segmenter
)
person_model = BayesianNetworkModel.train(
input_data=person_training_data,
structure=configuration.person_structure,
fields=configuration.person_fields
)
Was wondering what/if segmenters make the most sense for our KC synth.
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.
doppelganger/inputs.py, line 103 at r1 (raw file):
How about something a little more descriptive than 'Y' and 'N'? 'employed' and 'unemployed'? And would be great if those were defined like: ``` class EmploymentStatus(object): UNDER_16 = 'under16' UNEMPLOYED = 'unemployed' ... ```
This was prescient as I'm in the process of doing that for some other variables right now as well :) . Good suggestion!
Comments from Reviewable
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.
doppelganger/inputs.py, line 103 at r1 (raw file):
This was prescient as I'm in the process of doing that for some other variables right now as well :) . Good suggestion!
Also, am going with '(not)working' as '(un)employed' has strict census definitions around it and is technically a subset of what's in working.
Comments from Reviewable
Ask @alexeisw which segmentations he'd like to see.
Still not sure what you mean by "growing model size". Is segmentation making it slow?
It's okay to send None by just leaving the segmenter argument out
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.
Comments from Reviewable
Not slow, but the bayesian model output grows to keep track of much more conditional probs when you segment, and I was just keeping it more interpretable so I can analyze the outputs easier in the meantime.
I'll be sure and ask @alexeisw what we're going with for the production run though.
Yep, ultimately, I'd like to allow users to pass a function in or at least define their own in the runner script. Still thinking it through.. ideas welcome.
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.
Comments from Reviewable
I see! That makes sense. A couple more comments
Review status: 0 of 2 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.
doppelganger/inputs.py, line 126 at r2 (raw file):
def educational_attainment(code): ''' Educational attainment (SCHL) bb .N/A (less than 3 years old)
Can you explain that these are the PUMS codes (and maybe the year if they're allowed to change)
doppelganger/scripts/download_allocate_generate.py, line 24 at r2 (raw file):
def person_segmenter(x): return None # x[inputs.AGE.name]
So instead of changing these to return None, you can just remove them completely and you'll get the same effect
Comments from Reviewable
Oh and please add tests for the new code!
Review status: 0 of 2 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.
Comments from Reviewable
Review status: 0 of 2 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.
doppelganger/inputs.py, line 153 at r1 (raw file):
Same here for constant definitions
Done.
doppelganger/inputs.py, line 126 at r2 (raw file):
Can you explain that these are the PUMS codes (and maybe the year if they're allowed to change)
Good call! Done.
doppelganger/scripts/download_allocate_generate.py, line 24 at r2 (raw file):
So instead of changing these to return None, you can just remove them completely and you'll get the same effect
Got feedback from Alexei, we're keeping them in. Cleaned up the None vals and comments
Comments from Reviewable
Ping on tests :)
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.
doppelganger/scripts/download_allocate_generate.py, line 24 at r2 (raw file):
Got feedback from Alexei, we're keeping them in. Cleaned up the None vals and comments
Why are you keeping them in? The functionality will remain the same if you remove these functions and the code will be cleaner
Comments from Reviewable
Yep, just finished writing allocator's test for the margina-vehicles
branch. Diving into this guy today.
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.
doppelganger/scripts/download_allocate_generate.py, line 24 at r2 (raw file):
Why are you keeping them in? The functionality will remain the same if you remove these functions and the code will be cleaner
I should clarify, Alexei said we will want to employ segmentation for now at least. So I'm back to the original:
def person_segmenter(x): return x[inputs.AGE.name]
Comments from Reviewable
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.
doppelganger/scripts/download_allocate_generate.py, line 24 at r2 (raw file):
I should clarify, Alexei said we will want to employ segmentation for now at least. So I'm back to the original: `def person_segmenter(x): return x[inputs.AGE.name]`
You can find this change on the marginal-vehicles
branch
Comments from Reviewable
Review status: 0 of 4 files reviewed at latest revision, all discussions resolved.
Comments from Reviewable
Side note: temporarily removed segmentation due to growing model size. In general, do we want to use segmentation for first round pop-synths? If so, what to segment on if not the default age and household size.
Do we need to update populationgen.Population.generate to use other fields, or are the person.age, person.sex and household.num_people fields sufficient?
Example model output from running on a puma 29-00901: models.zip
Here's my first guess at the expanded model person_bn.json:
Can also see an argument for an arrow from education (attainment) to working (work status)
This change is![Reviewable](https://reviewable.io/review_button.svg)