Speedier database import

walkerke / census-with-r-book

Source for Analyzing US Census Data: Methods, Maps, and Models in R by Kyle Walker, published with CRC Press

Other

79 stars 27 forks source link

This looks great, so happy to see it all put together!

At lunch with @dtburk, we talked about how slow it sounded to import the 1910 full count data in section 11.2.2.

I was curious and so figured I'd report back, but please don't feel like you need to add another caveat about workflow. Also, I still remain mystified by database administration, so there may be drawbacks I'm unaware of. Plus dBeaver is a nice thing to show to help people get comfortable with databases.

Those caveats aside, in my experimentation, I found chunked reading to be a lot faster than dBeaver's import. On my machine it was probably closer to 2.5 hours through dBeaver, but this takes ~15 minutes, loading only 10k rows at a time (it also works with both csv and fixed width extracts).

read_ipums_micro_chunked(
  "usa_00004.xml", 
  IpumsSideEffectCallback$new(function(x, pos) {
    dbWriteTable(
      conn, Id(schema = "ipums", table = "census1910"), x, append = TRUE
    )
  })
)

walkerke / census-with-r-book

Speedier database import #5