rebeccajohnson88 / qss20_s21_proj

Repo for DOL Summer Data Challenge on equity in H-2A oversight
Creative Commons Zero v1.0 Universal
2 stars 2 forks source link

match row number discrepancy diagnostic code for reference #16

Closed rebeccajohnson88 closed 3 years ago

rebeccajohnson88 commented 3 years ago

deleted from final script but here


indices_approved = approved_matches$matches$inds.a
indices_approved_b = approved_matches$matches$inds.b 
sprintf("In the match object, there are %s unique indices for A; %s for B",
        length(unique(indices_approved)),
        length(unique(indices_approved_b))) 
both_ind = intersect(indices_approved, indices_approved_b) # see that they're the same just in diff order
indices_dropped = setdiff(rownames(approved_only), indices_approved)

approved_only$row_id = rownames(approved_only)
View(approved_only %>% filter(row_id %in% indices_dropped) %>% select(dedupe_fields, EMPLOYER_NAME))
rebeccajohnson88 commented 3 years ago

A couple things to discuss:

False pos examples: matches due to baggs, WY but only some are true the same

image

True positive examples:

image

image