tidyverse / readr

Read flat files (csv, tsv, fwf) into R
https://readr.tidyverse.org
Other
1.01k stars 285 forks source link

Fast writers #10

Closed hadley closed 9 years ago

hadley commented 10 years ago

We may also want to consider having fast writers. write.table generates the complete output before saving it to disk, so it's not suitable for saving large files. For a recent problem I had to write

n <- 10000L
m <- floor(nrow(logs) / n)
for (i in seq(0, m, by = 1)) {
  start <- i * n + 1
  end <- pmin((i + 1) * n, nrow(logs))

  write.table(logs[start:end, ], "logs.csv", row.names = FALSE, 
    sep = ",", append = i != 0L, col.names = FALSE, na = "")
  cat(".")
}

That's obviously not ideal.

romainfrancois commented 10 years ago

I did not realize this is what write.table did. I guess we can use something like dplyr visitors for fast writers.

If we go this way, then maybe we should change the name as this would not just be about fast reading of data anymore.

hadley commented 10 years ago

Somehow the general theme of fastread is about converting between column and row based data formats. Not sure how we can turn that into a good name.

romainfrancois commented 10 years ago

Hmm. io something. Data Input Output for R, so dior :)

romainfrancois commented 10 years ago

Joking aside, fastread or whatever it ends up being called could be a subscriber to the data frame library you mentioned in the context of ggvis and dplyr.

matthieugomez commented 9 years ago

I'd like a a fast / memory efficient write.table a lot.

gshotwell commented 9 years ago

Re: names how about:

clerk copyr muser scribr

Personal favorite is clerk. You do the analysis, then send the reading and writing to your clerk.

hadley commented 9 years ago

Hmmm, I do rather like clerk or scribe.

romainfrancois commented 9 years ago

Me too, esp as readr implies this is just about reading. btw, this discussion should be in #58 right.