ropensci / targets

Function-oriented Make-like declarative workflows for R
https://docs.ropensci.org/targets/
Other
934 stars 75 forks source link

Automatic detection of the low-level file system #1315

Closed wlandau closed 1 month ago

wlandau commented 2 months ago

targets primarily relies on hashes to detect changes in files, but to speed up computation, it sometimes uses time stamps to check if a hash is even worth calculating. I would like this to be default everywhere but some storage drives have formats like FAT which have imprecise time stamps.

@gaborcsardi, @hadley mentioned you might know how to auto detect if a file system is FAT, EXT, NFS, etc. Is there a robust way to do this for an R package? If so, targets could deprecate tar_option_set(trust_object_timestamps = TRUE) and format = "file_fast".

c.f. #1311

gaborcsardi commented 2 months ago

Yes, ps::ps_disk_partitions() will tell you the type of the file system, if that's all you need.

hadley commented 2 months ago

@gaborcsardi is there an existing helper to find the device that corresponds to a path? Or do you just find the longest mountpoint prefix that matches your path?

gaborcsardi commented 2 months ago

I don't know of a way from R. You need to call statfs(2) on Unix, IDK on Windows.

gaborcsardi commented 2 months ago

ps_fs_info() will now return file system information for one or more paths: https://github.com/r-lib/ps/pull/165 It finds the file system of the input files/directories automatically.

In Docker it is not very exciting:

> ps_fs_info(c("/pkg", "~"))
# A data frame: 2 × 26
  path  mount_point name       type  block_size transfer_block_size total_data_blocks free_blocks free_blocks_non_supe…¹
  <chr> <chr>       <chr>      <chr>      <dbl>               <dbl>             <dbl>       <dbl>                  <dbl>
1 /pkg  /pkg        :/Users/g… fuse…    1048576             1048576         242837545    60001470               60001470
2 ~     /           overlay    over…       4096                4096          25656302    11311661                9996858
# ℹ abbreviated name: ¹​free_blocks_non_superuser
# ℹ 17 more variables: total_nodes <dbl>, free_nodes <dbl>, id <list>, owner <dbl>, type_code <dbl>,
#   mount_flags_code <dbl>, subtype_code <dbl>, MANDLOCK <lgl>, NOATIME <lgl>, NODEV <lgl>, NODIRATIME <lgl>,
#   NOEXEC <lgl>, NOSUID <lgl>, RDONLY <lgl>, RELATIME <lgl>, SYNCHRONOUS <lgl>, NOSYMFOLLOW <lgl>
wlandau commented 2 months ago

ps_fs_info() will now return file system information for one or more paths: https://github.com/r-lib/ps/pull/165 It finds the file system of the input files/directories automatically.

Awesome, thanks @gaborcsardi! This looks like exactly what I need.

In Docker it is not very exciting

Not sure I follow. Is it because OverlayFS doesn't really tell us what the actual underlying file system is?

wlandau commented 1 month ago

I just added https://github.com/ropensci/targets/pull/1326 to (mostly) fix #1315. Still a few to-dos listed in the PR.

wlandau commented 1 month ago

Done in #1326