Closed nbenn closed 3 years ago
This is likely a duplicate of https://github.com/tidyverse/readr/issues/1161, it is fixed in the devel version of readr but not yet on CRAN.
This is likely a duplicate of #1161, it is fixed in the devel version of readr but not yet on CRAN.
@jimhester On the development version, vroom
backend (2nd edition) has problems
when (incorrectly) reading gzip
compressed files, which also increases the object size and memory footprint. This doesn't happen when using the 1st edition (the data is read correctly with no problems
), and can be easily captured by setting the correct column types. Here's an example:
COVID19.BR_Municipality <- read_delim(
"https://github.com/wcota/covid19br/raw/master/cases-brazil-cities-time.csv.gz",
delim = ",",
col_types = cols(
epi_week = col_integer(),
date = col_date(format = "%Y-%m-%d"),
country = col_character(),
state = col_character(),
city = col_character(),
ibgeID = col_character(),
cod_RegiaoDeSaude = col_character(),
name_RegiaoDeSaude = col_character(),
newDeaths = col_integer(),
deaths = col_integer(),
newCases = col_integer(),
totalCases = col_integer(),
deaths_per_100k_inhabitants = col_double(),
totalCases_per_100k_inhabitants = col_double(),
deaths_by_totalCases = col_double(),
`_source` = col_character(),
last_info_date = col_date(format = "%Y-%m-%d")
),
lazy = FALSE
)
@hsbadr, that issue is tracked by https://github.com/r-lib/vroom/issues/331 and should be fixed.
that issue is tracked by r-lib/vroom#331 and should be fixed.
Thanks @jimhester! I confirm that https://github.com/r-lib/vroom/commit/5fc54e61538e32007b9795bc56e6747e1a77d893 fixed the problem. I'll let you know if I run into a related issue.
Recently I have been running into
Error: vector memory exhausted (limit reached?)
errors when reading large gzip compressed .csv files using the chunked API. IIRC, earlier versions of readr would explicitly create a temporary file, containing the full uncompressed data, which then was fed intoread_csv_chunked()
.Looking at reported memory usage, this no longer seems to be the case. If this change was intentional, I apologize for having missed that, but I could not find any announcement hinting at this (neither from NEWS nor docs). Also, I feel this takes away some of the convenience of the chunked API. Of course, this can easily be resolved outside of readr by decompressing files manually beforehand (using for example
R.utils::gunzip()
).As it's not straightforward to create an example for this, I'll just add my session info (but I'm happy to provide further information if requested):