Open richarddmorey opened 1 year ago
I just confirmed that this happens on Ubuntu, though I had to be quicker with the CTRL-C because the terminal output was much faster.
Session info:
R version 4.1.2 (2021-11-01)
Platform: aarch64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/lib/aarch64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/aarch64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.2 tools_4.1.2 renv_0.17.3
@richarddmorey yes, tempdir
is not guaranteed to persist, so the apps should not use it for that purpose - after all it is by design temporary. Apps should use the current directory for persistence as that is the app directory used for all requests (or you can manage your own directory, of course).
The use of tempdir()
is quite tricky in this context, complex applications are encouraged to use unixtools::set.tempdir()
if the R code they use requires it as to have more full control over its location and permissions (important if switching users or implementing access controls). Either way, the application should use tempdir(TRUE)
to make sure it exists. Rserve
itself manages per-connection working directories, however RestRserve does not use that feature.
If RestRserve
wanted to guarantee separate temporary directories (at a performance cost) it could run
unlink(tempdir(), TRUE, TRUE)
tempdir(TRUE)
on connect which will re-create a fresh temporary directory for new connections. Perhaps it could be an option for apps that require it?
@s-u Thanks - I have written my app defensively to work around this issue, but the problems are
callr
), getting every package author to use a different folder for session-persistent temporary storage is difficult (see my request for callr). Luckily some package authors recognise this (see NOTES for rmarkdown) and have made changes, but not all. For these reasons, I think this is at least extremely undesirable behaviour, and probably should be thought of as a bug.
It seems like this workaround would solve it provided no important packages that create persistent storage when attached (e.g. callr
) are attached first.
Like I said, I think the most efficient way is to use set.tempdir()
since it guarantees a private location without affecting the server one. I guess an alternative would be to simply rename tempdir()
which doesn't require a full cleanup. Also I think the proper place for this is in the on-connect handler since you only need to do this once per HTTP session. You are right that one would hope that if you pre-load packages they do not cache the tempdir()
location in their namespace, otherwise you'd have to load them only after connecting - I have not checked, so don't know how common is such bad behavior.
ok, so when you say the on-connect handler do you mean the function that I defined (as an app developer using RestRserve) that handles the request in RestRserve? So every handler would e.g. call unixtools::set.tempdir() first thing (or equivalent code, to prevent a dependency on unixtools), before doing anything else? That seems straightforward.
Oh, I guess this leaves one question, though - when would all these tempdir()s get cleaned up? Does only the last one defined get cleaned up when R quits? I'm assuming so, otherwise this trick wouldn't work. Is the tradeoff, then, just tempdir()s everywhere, and having to do manual cleanup?
Describe the bug
Occasionally, RestRserve reports errors:
The RestRserve server continues running, but the tempdir() for the R session has been deleted. Any functions in any package that uses the tempdir will fail. I believe this results from a fork of the R session being cancelled and then cleaning up the tempdir(). I'm not sure, but it seems to result from something happening in Rserve.
The bug has made the development of complex apps difficult, because the disappearance of the tempdir is catastrophic to packages that use scratch space for the session. Even if very persistent storage isn't needed, packages often expect tempdir() to point to an existing directory, and for files not to disappear mid-processing.
This has been reported elsewhere (https://github.com/rexyai/RestRserve/issues/174#issuecomment-1375362318, https://github.com/s-u/Rserve/issues/121). I only just managed to get a reproducible example, which I provide below.
To Reproduce
The script below defines a simple GET response, which sleeps for 3 seconds (to give some time to react, as in the instructions below; also simulates the generation of the file) then reads a moderately-sized HTML file and serves it.
A zip of the file can be found here index.html.zip - but I don't think the file itself is important just that it is large so slow to print in the terminal (giving you time to cancel it).
Steps to reproduce:
ls
at the terminalcurl
to GET the file:curl "http://localhost:8080/a"
ls
to check the tempdir reported above for the session no longer exists.Expected behavior
No error should appear and the tempdir should not have been deleted.
Environment information
Additional context
The steps to reproduce above work for me every time on MacOS, but I have not had the opportunity to test on a linux distro.