tidyverse / vroom

Fast reading of delimited files
https://vroom.r-lib.org
Other
621 stars 60 forks source link

no tempfile cleanup when reading compressed files #466

Open tbarchyn opened 2 years ago

tbarchyn commented 2 years ago

When reading compressed files tempfiles are created, but not cleaned up during garbage collection on Windows.

The end result is filling up the tempdir, and in my case filling up the hard drive and crashing the computer.

This was an old issue from earlier this year https://github.com/tidyverse/vroom/issues/242#issuecomment-1175345004_ . Opening it again in case anyone has ideas on solutions.

Windows 10 Version 10.0.19043 Build 19043 R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043) vroom_1.5.7

library(vroom)

x <- vroom(file(vroom_example("mtcars.csv.gz")))
#> Rows: 32 Columns: 12
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): model
#> dbl (11): mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
list.files(tempdir())
#> [1] "vroom-5644508a1b"

rm(x)
gc()
#          used  (Mb) gc trigger  (Mb) max used  (Mb)
#Ncells 2433244 130.0    3670861 196.1  3670861 196.1
#Vcells 6202471  47.4   12255594  93.6  8692029  66.4

list.files(tempdir())
#> [1] "vroom-5644508a1b"

For future reference of any folks googling into this issue, here is the function I call periodically to deal with this - sort of a blunt hammer approach but prevents your hard drive from filling up with vroom tempfiles:

clear_vroom_tempfiles <- function () {
    # helper function to clear the vroom temp files that don't get deleted
    files <- list.files (tempdir(), full.names = T, pattern = "vroom")
    file.remove (files)
}

Originally posted by @tbarchyn in https://github.com/tidyverse/vroom/issues/242#issuecomment-1175345004

kbzsl commented 1 year ago

Similar issue here (win 11, R 4.2.1, vroom 1.6.3, readr 2.1.4). These temporary files are deleted only when the R session is closed; in case of a crash these temp files remain undeleted. The garbage collection (gc()) is not removing any vroom temp file, and there are no objects what I could remove (with rm()), because the output is piped directly to the arrow write function, like readr::readfwf('*.txt.gz') |> arrow::write_dataset().

hadley commented 1 year ago

This seems fine for me:

library(vroom)

x <- vroom(file(vroom_example("mtcars.csv.gz")), show_col_types = FALSE)
list.files(tempdir())
#> [1] "vroom-d609462988c"

rm(x)
. <- gc()
list.files(tempdir())
#> character(0)

Created on 2023-08-01 with reprex v2.0.2

@kbzsl are you on windows too? This might be a windows file locking issue.

kbzsl commented 1 year ago

yes, I am using win 10.