ices-tools-prod / icesTAF

Functions to support the ICES Transparent Assessment Framework
GNU General Public License v3.0
5 stars 7 forks source link

bug in write.taf with large data.frames #11

Closed colinpmillar closed 4 years ago

colinpmillar commented 5 years ago

For large data.frames, the system has not finished writting the file when R tries to open the file connection in unix2dos

https://github.com/ices-tools-prod/icesTAF/blob/15b9e4347ac024a597ba1ea35e96c5054bf7ed4c/R/write.taf.R#L105

write.taf(catch_dat, dir = "data", quote = TRUE)

# Error in file(file, open = "wb") : cannot open the connection
# In addition: Warning message:
# In file(file, open = "wb") :
#   cannot open file 'data/catch_dat.csv': Invalid argument
arni-magnusson commented 5 years ago

I've tried

n <- 1e6
big <- data.frame(x=1:n, y=rnorm(n), z=rpois(n,1))
write.taf(big)
line.endings("big.csv")

but did not get an error. Is there another example that gives an error?

colinpmillar commented 5 years ago

Hmmm - not any more - I can;t seem to replicate it, but something was afoot, because I rewrote the line as:

https://github.com/ices-taf/2019_NrS_FisheriesOverview/blob/d556b1cf8ab36d7036b9a361ebea3c7814d2fb8b/data.R#L24

but now it works just fine!

colinpmillar commented 5 years ago

Still there... very repeatable, but seems to be intermitent - I am pretty sure it is that unix2dos() is trying to open a file that is not quite written yet, not sure how to get round that though.

Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Running .First(), R in interactive mode, check `.First` for details ...
  .libPaths() set to: 
        D:/R/win-library/3.6
        D:/Program Files/R/R-3.6.1/library
  options: stringsAsFactors=FALSE
checking for new packages ... 
all up to date.
  Autoloading: remotes::install_github, lattice::xyplot
> setwd("D:\\projects\\git\\ices-taf\\FOs\\2019_CS_FisheriesOverview")
> library(icesTAF)
> clean()
> sourceTAF("data")
[04:03:17] data.R running...

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Error in file(file, open = "wb") : cannot open the connection
In addition: Warning message:
In file(file, open = "wb") :
  cannot open file 'data/catch_dat.csv': Invalid argument
[04:03:33]   data.R failed
colinpmillar commented 5 years ago

test file: https://www.dropbox.com/s/znhfusigdfxuf5v/test2.RData?dl=1

loaded this then tried from a fresh R in ~ dir:

> icesTAF::write.taf(catch_dat, quote = TRUE) # OK
> library(icesTAF)
> mkdir("data")
> icesTAF::write.taf(catch_dat, dir = "data", quote = TRUE) # OK
> setwd("D:\\projects\\git\\ices-taf\\FOs\\2019_CS_FisheriesOverview")
> mkdir("data")
> icesTAF::write.taf(catch_dat, dir = "data", quote = TRUE)
Error in file(file, open = "wb") : cannot open the connection
In addition: Warning message:
In file(file, open = "wb") :
  cannot open file 'data/catch_dat.csv': Invalid argument
> rmdir("data")
> mkdir("data")
> icesTAF::write.taf(catch_dat, dir = "data", quote = TRUE)
Error in file(file, open = "wb") : cannot open the connection
In addition: Warning message:
In file(file, open = "wb") :
  cannot open file 'data/catch_dat.csv': Invalid argument
> traceback()
3: file(file, open = "wb")
2: unix2dos(file)
1: icesTAF::write.taf(catch_dat, dir = "data", quote = TRUE)

very odd!

arni-magnusson commented 4 years ago

Still haven't been able to reproduce this in Windows or Linux, but it sounds like the core R function write.csv() can - in some cases - exit before the file has been fully created.

The purpose of calling unix2dos() at the end of write.taf() was to conform to the CSV standard, but the potential problems outweigh the benefits.

Removed unix2dos() call in commit 3fbd4a8.

On a Windows machine, such as the TAF server, the resulting file will have Dos line endings (CRLF) so that's pretty good. On a Linux machine, the resulting files will be slightly smaller and some diff tools detect this as a difference in the output.

Users can always call unix2dos() explicitly if they find it helpful in their analysis.