nooreendabbish / Traffic

JSM 2016 GSS Data Challenge
1 stars 2 forks source link

Recoding of Data #17

Open chenchen715 opened 8 years ago

chenchen715 commented 8 years ago

Hi both:

Can we put in this thread about how we re-code the variables that are not straightforward?

Here is what Patrick did for HEAVY_TRUCK in 2013 data: HEAVY_TRUCK: if BODY_TYP in (64, 66, 72) =1; (78, 79)=NA; everything else = 0;

Note: we can use BDYTYP_IM instead of BODY_TYP here, the first guy is imputed version of the second guy, so 78 and 79 should not be there for BDYTYP.

PatrickCoyle commented 8 years ago

Sounds good. I will do my best to summarize my current attempt. But I want to mention this for the time being:

I adjusted the recode script to include NA's (and a factor level for NA's). This was to apply Lucas's script, but I think they might need to be numeric for that anyway. Additionally, the NA level needs to be dropped if we include the variables in models. Otherwise, the model fits a parameter for the NA level. This might be a feature as opposed to a bug; as I understand it, the NHTSA and other organizations use GLM for imputation. But we are trying to find a unified and simple approach to handling missing data. Sorry I left this in a half-finished form.

Patrick

chenchen715 commented 8 years ago

Patrick wrote on June 30th: Note that I recoded SEX_IM to be zeroes (male) and ones (female), since ones and twos were giving me errors at some point in my package. Haven't pinpointed it. HEAVY_TRUCK is an integer, not a factor.

chenchen715 commented 8 years ago

Patric, could you explain the rationale of coding HEAVY_TRUCK in 2013? And would it make sense to you to use BDYTYP_IM, instead of BODY_TYP? GES2013$HEAVY_TRUCK <- ifelse(GES2013$BODY_TYP %in% c(64, 66, 72), 1, ifelse(GES2013$BODY_TYP %in% c(78, 79), NA, 0))

PatrickCoyle commented 8 years ago

We are aiming to capture commercial truck drivers, so I googled the trucks and those were the only three that seemed like what we're looking for. Obviously not scientific..... might be worth a second look.

Patrick

On Mon, Jul 4, 2016 at 5:23 PM, Chen Chen notifications@github.com wrote:

Patric, could you explain the rationale of coding HEAVY_TRUCK in 2013? And would it make sense to you to use BDYTYP_IM, instead of BODY_TYP? GES2013$HEAVY_TRUCK <- ifelse(GES2013$BODY_TYP %in% c(64, 66, 72), 1, ifelse(GES2013$BODY_TYP %in% c(78, 79), NA, 0))

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nooreendabbish/Traffic/issues/17#issuecomment-230359848, or mute the thread https://github.com/notifications/unsubscribe/ASq5oMDlUTKcXYZrSStiCACjfx3WBBDBks5qSYf0gaJpZM4I16kf .

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

chenchen715 commented 8 years ago

I suggest we add following categories into HEAVY_TRUCK = 1, and also we should be using BDYTYP_IM as the coding variable.

The reason is that commercial truck (Class-B commercial driver's license (CDL) required) in Wikipedia is defined as GVWR >= 26,001 lb. Thus we should include 63, this will give 555 more PARs. And I don't feel very good to abandon 78, which could be heavy truck. It will give 101 PARs from 2013 data. What do you think?

Here is the link to wiki page: https://en.wikipedia.org/wiki/Truck_classification

PatrickCoyle commented 8 years ago

Sounds great! Thank you for looking into it!

On Tue, Jul 5, 2016 at 7:34 PM, Chen Chen notifications@github.com wrote:

I suggest we add following categories into HEAVY_TRUCK = 1, and also we should be using BDYTYP_IM as the coding variable.

  • 63: Single-Unit Straight Truck or Cab-Chassis (GVWR>26,000 lbs) (Since 2011)
  • 78: Unknown Medium/Heavy Truck Type

The reason is that commercial truck (Class-B commercial driver's license (CDL) required) in Wikipedia is defined as GVWR >= 26,001 lb. Thus we should include 63, this will give 555 more PARs. And I don't feel very good to abandon 78, which could be heavy truck. It will give 101 PARs from 2013 data. What do you think?

Here is the link to wiki page: https://en.wikipedia.org/wiki/Truck_classification

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nooreendabbish/Traffic/issues/17#issuecomment-230641983, or mute the thread https://github.com/notifications/unsubscribe/ASq5oAnQ4_DwH_QvT3iiEjjGEKHDaUFsks5qSvgBgaJpZM4I16kf .

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

chenchen715 commented 8 years ago

So, coding for HEAVY_TRUCK now will be: HEAVY_TRUCK <- ifelse(GES2013$BDYTYP_IM %in% c(63, 64, 66, 72, 78), 1, 0)

And for DRYOUNG will be: DRYOUNG <- factor(ifelse(GES2013$AGE_IM %in% 17:24, 1, 0))

Could you update your side of code?

PatrickCoyle commented 8 years ago

Actually, the report from earlier tells us that about 20% of the commercial truck workforce in 2013 was between 20 and 34, and that this was well below the national average. I suggest we use one of the cutoffs provided by this report, since we have the research to back it up.

But the report only covers drivers over 20. Since the youngest cutoff is a slim proportion of the truck drivers, I don't expect the sample size to be big enough, so I think we should try the first two and see what we come up with.

GES2013.drivers$AGE_LOWER_QUINT <- ifelse(GES2013.drivers$AGE %in% 20:34, 1, ifelse(GES2013.drivers$AGE >34, 0, NA))

Patrick

On Tue, Jul 5, 2016 at 8:16 PM, Chen Chen notifications@github.com wrote:

So, coding for HEAVY_TRUCK now will be: HEAVY_TRUCK <- ifelse(GES2013$BDYTYP_IM %in% c(63, 64, 66, 72, 78), 1, 0)

And for DRYOUNG will be: DRYOUNG <- factor(ifelse(GES2013$AGE_IM %in% 17:24, 1, 0))

Could you update your side of code?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nooreendabbish/Traffic/issues/17#issuecomment-230647478, or mute the thread https://github.com/notifications/unsubscribe/ASq5oLI6h7KM84M7xOMils13X3aJOhhvks5qSwH8gaJpZM4I16kf .

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992