pola-rs / r-polars

Bring polars to R
https://pola-rs.github.io/r-polars/
Other
405 stars 35 forks source link

File not found in `write_` and `read_` functions when filepath contains tilde #1085

Open george-wood opened 3 weeks ago

george-wood commented 3 weeks ago

There seems to be an issue locating files when the filepath/source contains a tilde (e.g. "~/Desktop/some_file.csv"). This is a common pattern for macOS users. Any ideas what might be causing this?

library(polars)
pl$DataFrame(a = 1)$write_csv("~/Desktop/foo_r.csv")
#> Error: Execution halted with the following contexts
#>    0: In R: in $write_csv():
#>    0: During function call [base::tryCatch(base::withCallingHandlers({
#>           NULL
#>           base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/var/folders/qw/kjq09mn10l5cmw355f10qr2h0000gn/T//RtmpGAEKud/callr-fun-f0d4b669a3b"), 
#>               base::list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, 
#>               quote = TRUE), file = "/var/folders/qw/kjq09mn10l5cmw355f10qr2h0000gn/T//RtmpGAEKud/callr-res-f0d3d17be29", 
#>               compress = FALSE)
#>           base::flush(base::stdout())
#>           base::flush(base::stderr())
#>           NULL
#>           base::invisible()
#>       }, error = function(e) {
#>           {
#>               callr_data <- base::as.environment("tools:callr")$`__callr_data__`
#>               err <- callr_data$err
#>               if (FALSE) {
#>                   base::assign(".Traceback", base::.traceback(4), envir = callr_data)
#>                   utils::dump.frames("__callr_dump__")
#>                   base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`, 
#>                       envir = callr_data)
#>                   base::rm("__callr_dump__", envir = .GlobalEnv)
#>               }
#>               e <- err$process_call(e)
#>               e2 <- err$new_error("error in callr subprocess")
#>               class <- base::class
#>               class(e2) <- base::c("callr_remote_error", class(e2))
#>               e2 <- err$add_trace_back(e2)
#>               cut <- base::which(e2$trace$scope == "global")[1]
#>               if (!base::is.na(cut)) {
#>                   e2$trace <- e2$trace[-(1:cut), ]
#>               }
#>               base::saveRDS(base::list("error", e2, e), file = base::paste0("/var/folders/qw/kjq09mn10l5cmw355f10qr2h0000gn/T//RtmpGAEKud/callr-res-f0d3d17be29", 
#>                   ".error"))
#>           }
#>       }, interrupt = function(e) {
#>           {
#>               callr_data <- base::as.environment("tools:callr")$`__callr_data__`
#>               err <- callr_data$err
#>               if (FALSE) {
#>                   base::assign(".Traceback", base::.traceback(4), envir = callr_data)
#>                   utils::dump.frames("__callr_dump__")
#>                   base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`, 
#>                       envir = callr_data)
#>                   base::rm("__callr_dump__", envir = .GlobalEnv)
#>               }
#>               e <- err$process_call(e)
#>               e2 <- err$new_error("error in callr subprocess")
#>               class <- base::class
#>               class(e2) <- base::c("callr_remote_error", class(e2))
#>               e2 <- err$add_trace_back(e2)
#>               cut <- base::which(e2$trace$scope == "global")[1]
#>               if (!base::is.na(cut)) {
#>                   e2$trace <- e2$trace[-(1:cut), ]
#>               }
#>               base::saveRDS(base::list("error", e2, e), file = base::paste0("/var/folders/qw/kjq09mn10l5cmw355f10qr2h0000gn/T//RtmpGAEKud/callr-res-f0d3d17be29", 
#>                   ".error"))
#>           }
#>       }, callr_message = function(e) {
#>           base::try(base::signalCondition(e))
#>       }), error = function(e) {
#>           NULL
#>           if (FALSE) {
#>               base::try(base::stop(e))
#>           }
#>           else {
#>               base::invisible()
#>           }
#>       }, interrupt = function(e) {
#>           NULL
#>           if (FALSE) {
#>               e
#>           }
#>           else {
#>               base::invisible()
#>           }
#>       })]
#>    1: Os { code: 2, kind: NotFound, message: "No such file or directory" }

Created on 2024-05-07 with reprex v2.1.0

I checked the Python implementation and it works as expected:

import polars as pl
pl.DataFrame({"a": 1}).write_csv("~/Desktop/foo_python.csv")
#> None
pl.read_csv("~/Desktop/foo_python.csv")
#> shape: (1, 1)
#> ┌─────┐
#> │ a   │
#> │ --- │
#> │ i64 │
#> ╞═════╡
#> │ 1   │
#> └─────┘

Created at 2024-05-07 15:27:21 BST by reprexlite v0.5.0

george-wood commented 3 weeks ago

Just to note: Using a tilde worked previously; I noticed this error only recently.

etiennebacher commented 3 weeks ago

Thanks for the report, do you want to make a PR for this? I think this should be solved using path.expand() but we need to ensure it works correctly when we pass a URL

george-wood commented 3 weeks ago

@etiennebacher Will do. However, before doing so, do you think the cause is on the rust side? Perhaps with the update to 0.39.0? I cannot work out why tilde is only now causing problems.

I agree this can be solved using path.expand(), but I can't see why expansion should be necessary for write_csv() where there is no URL checking:

https://github.com/pola-rs/r-polars/blob/6fdd4a9a9cce94f29596b30a534f985f169eb30b/R/dataframe__frame.R#L1949-L1973

If we end up using path.expand(), it will need to be implemented across all io functions.

etiennebacher commented 3 weeks ago

I agree this can be solved using path.expand(), but I can't see why expansion should be necessary for write_csv() where there is no URL checking:

My bad, I thought this error was for read_csv() only.

I can't see any recent changes in the R or Rust methods that would explain this failure so it's probably from rust-polars:

https://github.com/pola-rs/r-polars/blame/1de4d0dfdf83228f2d4933fc7d8239a565d907fa/R/dataframe__frame.R#L1949 https://github.com/pola-rs/r-polars/blame/1de4d0dfdf83228f2d4933fc7d8239a565d907fa/src/rust/src/rdataframe/mod.rs#L487

Do you remember what was the last package version for which it worked?

etiennebacher commented 3 weeks ago

They already normalize the file path in python but I can't find a recent commit that changed that, so I suppose passing a path starting with ~ to rust-polars was not really supported before:

https://github.com/pola-rs/polars/blob/d3411566a29e414cd77bfcdad365362c944d48df/py-polars/polars/dataframe/frame.py#L2299-L2305

eitsupi commented 3 weeks ago

Might be worth checking node.js and polars-cli

george-wood commented 3 weeks ago

Do you remember what was the last package version for which it worked?

I believe it worked before the update to rust-polars 0.39.

They already normalize the file path in python

I suppose we should implement the equivalent normalization if ~ is not supported on the rust-polars side. What do you think @etiennebacher and @eitsupi?

eitsupi commented 3 weeks ago

I don't know if this is the change was intended upstream or not. I think it could be a bug.

etiennebacher commented 3 weeks ago

I suppose we should implement the equivalent normalization if ~ is not supported on the rust-polars side

I think so, especially since they handle the ~ in python

I don't know if this is the change was intended upstream or not. I think it could be a bug.

It could be but the file where write_csv() is defined in python is too big to use git blame in the github interface so I can't check if it changed recently