rjweiss / CaliforniaGreatRegister

0 stars 1 forks source link

Inspect years 1912-1918 in Alameda county #1

Open rjweiss opened 8 years ago

rjweiss commented 8 years ago

Some years have an extra field "gender" for them, which should be counted separately.

bspahn commented 8 years ago

These years also very rarely have the name field populated.

bspahn commented 8 years ago

Also, 1912 is the only year with its own gender field.

rjweiss commented 8 years ago

No longer see names with row numbers at the beginning of the name field (issue discussed in person).

> data = read_csv('~/Box Sync/CaliforniaGreatRegisters/working_data/alameda_successes.txt')
> alameda_names = data$name
> alameda_names[grepl('^\\d', alameda_names)]
[1] "175 Rose 1r s Iren 1"                                                                                                                                             
[2] "211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 211 212 21 3 21 245 2 6 247 248 249 250 2"
rjweiss commented 8 years ago

Confirmed distribution of empty name fields. Plot below of number of empty NAs over total number of rows per roll number-year. Appears to mostly be the result of the row number carrying over into an address field.

alameda_names = dplyr::select(data, name, rollnum, pagenum)
namefails = ddply(alameda_names, .(rollnum), summarise, 
      na = sum(is.na(name)),
      n = length(name),
      rate = na/n)
namefails = join(alamedayears, namefails, by='rollnum')
ggplot(namefails, aes(x=year, y=rate)) + geom_point()

alamedanamefails

rjweiss commented 8 years ago

Fixed the row number transcription error. Empty name field looks much better.

ancestrynamefails

bspahn commented 8 years ago

I think your fix pushed the problem into the address field:

dat %>% filter(yr==1916) %>% select(recordnum, pagenum,occupation, name, address) %>% head recordnum pagenum occupation name address 1 72921 9 housewife Ande rrm Mrs J osephine 3234 Enciinul ave 2 72922 9 bridge tender f Arada Seymour foot of Peach st 3 72923 9 housewife i 4 lierach Mrs Emma 1226 High st 4 72924 9 holusewife IR 35ehiergen rs Rehla n 3227 Madisola sat 5 72925 9 niurise l'i'Scliriiidt Jiacob foot of Poech tailor p 137 SMliolz lMrs Itertlih 1 3248 Encinnl ave 6 72926 12 retired Abt Charles 3272 Briggs ave****