qsbase / qs

Quick serialization of R objects
405 stars 19 forks source link

unable to write to DBFS? #51

Closed massugur closed 8 months ago

massugur commented 3 years ago

This is a wonderful package! qsave() is 24x faster than saveRDS() in my case when saving a large R object (30-60GB).

When I was working on Databricks clusters, I was unable to write .qs file as qsave(df, “/dbfs/myfile.qs“ (no error message). However, I can successfully do saveRDS(df, “/dbfs/myfile.rds”).

I figured out a walkaround is that I can write to the local driver node first qsave(df, “myfile.qs”)and then transfer the .qs to DBFS location. I am pretty sure it is not the most efficient way, did I miss anything here?

Meanwhile I had no problem in reading from DBFS after I transferred the file qread("/dbfs/myfile.qs").

traversc commented 3 years ago

I'm not familiar with databricks, but if it's just a mount point I'm surprised. Was there an error message or was it silent? Is it reproducible with smaller objects?

massugur commented 3 years ago

There was no error message, just silent. Yes, it applied to all objects.

traversc commented 3 years ago

Thanks, I'd like to understand more. What's the easiest way to set up a databricks system?

massugur commented 3 years ago

I believe you can try Databricks Community Edition. (https://databricks.com/product/faq/community-edition)

traversc commented 3 years ago

I signed up for the community edition, but I'm not able to repro the issue (see below). Any ideas?

Since you're working with very large data, it could be an issue if you run out of memory or file space. But since you said it happens on small objects too, I am not sure.

Untitled
massugur commented 3 years ago

Thanks for exploring this! I have tried the exactly same commands. And I still had the same problem. There was no error message for qsave(), it was silent. Is there anything else I can try to identify the problem? Thanks!

Screen Shot 2021-03-03 at 5 07 34 PM Screen Shot 2021-03-03 at 5 07 16 PM
traversc commented 3 years ago

I'm not sure what the issue is, going to need more info.

Did you test this on the community edition, what's the difference between your set up and the community edition?

Could you post sessionInfo() and can you test out installing the latest update? devtools::install_github("traversc/qs")

massugur commented 3 years ago

Thanks for looking into this! I tested on the community edition. It perfectly worked, just like what you showed. This time I also tried fst::write_fst(). fst::write.fst() worked on the community edition, but not in my set up (paid version).

In summary,

  1. saveRDS(mtcars, "/dbfs/temp.rds") worked on both community edition and my set up. Many others also worked on my set up writting to DBFS, such as data.table::fwrite() and arrow::write_arrow().
  2. qsave(mtcars, "/dbfs/temp.qs") worked only on community edition. On my set up, qsave(mtcars, "/dbfs/temp.qs") didn't work (no file saved), but no error message was shown. qsave(mtcars, "temp.qs") worked on my set up, with the file saved on local driver node. qread("/dbfs/temp.qs") worked after I manually copied the file from local driver node to DBFS.
  3. write_fst(mtcars, "/dbfs/temp.fst") worked only on community edition. write_fst(mtcars, "temp.fst") worked on my set up, with the file saved on local driver node. read_fst("temp.fst") worked. However, read_fst("/dbfs/temp.fst") didn't work even after I manually moved the file from local driver node to DBFS. Please find the error message below. Screen Shot 2021-03-05 at 1 40 10 AM Screen Shot 2021-03-05 at 1 39 28 AM

I noticed this line in DBFS documentation (https://docs.databricks.com/data/databricks-file-system.html):

Does not support random writes. For workloads that require random writes, perform the I/O on local disk first and then copy the result to /dbfs.

Do you think it could be something relevant? Or I can definitely try to provide more details.

massugur commented 3 years ago

R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.7 LTS

Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] fst_0.9.4 arrow_3.0.0 qs_0.23.6 data.table_1.12.6 [5] dplyr_0.8.3

loaded via a namespace (and not attached): [1] stringfish_0.15.0 Rcpp_1.0.2 magrittr_1.5
[4] bit_1.1-14 tidyselect_0.2.5 RApiSerialize_0.1.0 [7] R6_2.4.0 rlang_0.4.1 hwriter_1.3.2
[10] SparkR_2.4.5 tools_3.6.2 parallel_3.6.2
[13] htmltools_0.4.0 bit64_0.9-7 RcppParallel_5.0.3 [16] assertthat_0.2.1 digest_0.6.22 tibble_2.1.3
[19] Rserve_1.8-6 crayon_1.3.4 purrr_0.3.3
[22] vctrs_0.2.0 hwriterPlus_1.0-3 zeallot_0.1.0
[25] glue_1.3.1 compiler_3.6.2 pillar_1.4.2
[28] backports_1.1.5 TeachingDemos_2.10 pkgconfig_2.0.3

traversc commented 3 years ago

Thank you for digging that up. I didn't even realize it's possible for a file system to not support random writes lol.

I'll try to investigate a bit more later this weekend.

If that is the culprit there is not much that can be done, and is on Microsoft.

massugur commented 3 years ago

I think DBFS FUSE mount version causes the problem. Thanks again for taking time to look into this!

According to Databricks Documentation:

Azure Databricks uses a FUSE mount to provide local access to files stored in the cloud. A FUSE mount is a secure, virtual filesystem. FUSE V2 (default for Databricks Runtime 6.x and 7.x). Does not support random writes. FUSE V1 (default for Databricks Runtime 5.5 LTS) If you experience issues with FUSE V1 on 5.5 LTS, Databricks recommends that you use FUSE V2 instead. You can override the default FUSE version in 5.5 LTS by setting the environment variable DBFS_FUSE_VERSION=2.

I've tried the following on my set-up.

  1. DBR 5.5 (default FUSE V1), qsave("/dbfs/temp.qs) workrd.
  2. DBR 5.5 setting environment variable DBFS_FUSE_VERSION=2, qsave("/dbfs/temp.qs") did't work, no error message.
  3. DBR 7.5 (default FUSE V2), qsave("/dbfs/temp.qs") didn't work, no error message.

The above results applied to fst("/dbfs/temp.fst") as well.

traversc commented 3 years ago

I think you're right. The community edition worked on all versions (maybe not a real FUSE drive), and I didn't get around to spinning up an Azure system.

The new version of qs (0.24.1 on CRAN) should hopefully at least return an error.

Could you check that out if you're using 0.23.6?

massugur commented 3 years ago

I used 0.24.1 to produce the above results today :) No error messages.

Yes. The community edition worked on all versions. I realized the community edition has different disk mappings. In community edition, /dbfs/ maps to local file system, which can be found by %fs ls file::/dbfs/, but not %fs ls dbfs:/. While in the paid edition, /dbfs/ refers to files on DBFS, %fs ls dbfs:/.

traversc commented 3 years ago

I loaded up Azure and did some testing finally.

devtools::install_github("traversc/stringfish") # dependancy, necessary to use github version for other reasons
devtools::install_github("traversc/qs")

library(qs)
qsave(1, file="/dbfs/temp.qs")

Should give you an error now:

basic_ios::clear: iostream error

This could be a little more descriptive, but at least it's not silent.

Btw, I believe fst is already fixed (ver 0.9.4)?

library(fst)
write_fst(mtcars, path="/dbfs/temp.fst")
 Error in write_fst(mtcars, path = "/dbfs/temp.fst") : Error in write_fst(mtcars, path = "/dbfs/temp.fst") : 
  There was an error during the write operation, 
fst file might be corrupted. Please check available disk space and access rights.
massugur commented 3 years ago

Sorry for the confusion. Yes, fst v 0.9.4 can't write to DBFS (FUSE v2), but provided error messages.

I've tried installing the latest update and now got an error message.

basic_ios::clear: iostream error

Thanks for helping on this!

JZL commented 2 years ago

I'm happy to start a new issue because it's only tangentially related. But I was also getting basic_ios::clear: iostream error. I was red-lining my memory use so thought it could be from that. But when I changed it to saveRDS I got

  cannot open compressed file '/home/...VERY LONG FILENAME...', probable reason 'File name too long'

and I didn't realize there was such a limit. I made a shorter filename, reran, and both qs and saveRDS worked

I think it could be helpful for qs to also give probable reasons for such weirdness if possible because it's a pretty easy fix once I knew what to do

(Side note: qs is amazing and so fast. I use it constantly every day and it's great!)

Also just for posterity, it's a 255 character limit on my system

seq(1, 500) %>% map(~{
  print(.)
  FN = paste0(rep("A", .), collapse="")
  fqsave(1, glue("/tmp/{FN}"))
})