jessicalum / Summer-Research

Summer research, brainstorming projects, etc.
0 stars 3 forks source link

Possible Idea #6

Open dmitrypolo opened 8 years ago

dmitrypolo commented 8 years ago

So I remember someone mentioned that they wanted to include a variable for happiness in the CPS analysis, but that a variable for happiness itself was not included. I thought of a way to measure happiness, relative to the amount of hours worked. We are assuming happiness is directly impacted by the amount of hours worked. In this DATASET they provide us with the mean number of hours worked by zip code for all persons of ages 16-64. What if we assign the variable happiness as a rank. For example: say the mean was 34 hours worked, and this particular person worked 30 hours. Then we could simply do 34 - 30 / 100 to yield us 4%. This would mean that this person fell in the top 4% of happiness, as it relates to working, in their respective zip code (doesn't necessarily have to be by zip code, we can broaden the spectrum). Also, if the person works more than the mean amount of hours we can assume they are happy since they are working a "reliable schedule". Conversely we can also determine this variable another way. We can compute the variance for each person and give those with higher variances, a lower happiness integer. Lastly, we could also computer the standard deviation, and place people into 4 quartiles, that way we can account for those on the farther end of spectrum. Let me know if this sounds like a good idea. Just a thought.

jessicalum commented 8 years ago

So what I read from the economic policy institute report was that certain people who work part time desire full time hours but there are those who work full time and desire to be part time. So a measurement for happiness that assumes more hours than the mean = happiness may not necessarily mean that they are happy if their preference is to work less hours. If there is a question that directly asks what the work hour preference is then the standard deviation measurement could work. In this case, we would have to use the mean over time for each person, as opposed to the mean of work hours per zip code. A variable for change in variance of work hours per week over time for irregular scheduling sounds good, where a tighter distribution may mean more work satisfaction, relative to past work schedules.

jessicalum commented 8 years ago

The variable WSHRSPREF, available for may 2001 only, is a categorical variable and asks for the desired number of work hours and income of the individual's primary job. Data available only for persons age 15+ that are currently employed, excluding self-employed persons. The values are 1 = fewer hrs/ same rate/earn less 2 = same hrs/ earn same money 3 = more hrs/ same rate/ earn more 96 = refusal 98 = no response 99 = NIU (not in universe)

Those looking to use this variable should note that sample selection from missing observations may be a concern. For example, individuals who choose to refuse to answer may have certain characteristics associated with why they refused to answer, perhaps those who earn less tend to refuse to answer this question. If that's the case, then the estimates may be biased because the sample isn't truly random.

jessicalum commented 8 years ago

Hey all,

So I spoke to Professor Sevak about the topic and it's not a problem if we don't have a solid outcome variable to study yet. Basically for the time being, we should just study the regional/ county differences of work schedule irregularity by different criteria such as race, gender, income level of household, etc. and see what patterns emerge first within the 2015 variables of the CPS. We can essentially then merge the CPS data by county with other data, for example, the NLSY, to create an aggregate measurement of job satisfaction by county if the NLSY has data to create that variable. But for now, we can focus on studying this one variable in depth before thinking about dependent variables and regressions and such.

She also suggested that we work with more recent data, aka data for 2015 vs. data for past years like I mentioned above because it allows us to narrow it down to working with just 12 months of available data vs. working with many years and potentially having unbalanced data and dealing with issues that could have major implications (like the Great Recession).