njtierney / geotargets

Targets extensions for geospatial data
https://njtierney.github.io/geotargets/
Other
56 stars 4 forks source link

`tar_terra_rast` and terra objects won't work with cloud storage #112

Open Aariq opened 2 hours ago

Aariq commented 2 hours ago

I think there may be a problem with using cloud storage. When a SpatRaster is stored with repository = "aws", for example, and a user tries to load it with tar_read() or tar_load(), I think what happens is the file is downloaded from AWS into tempdir()/_targets/scratch, then read in using the read function stored in format, then the file is deleted from the scratch dir once the target is loaded into memory. This means the SpatRaster object makes it into memory, but the file it points to is gone.

I've confirmed this behavior with a S3 bucket hosted on Jetstream2 and I can share (privately) the credentials for it if you'd like to test to confirm this.

I can't find an argument or option in targets that overrides this behavior, although I would have expected memory = "persistent" to maybe do something here.

library(targets)
tar_dir({
  tar_script({
    library(targets)
    library(geotargets)
    tar_option_set(
      repository = "aws",
      resources = tar_resources(
        aws = tar_resources_aws(
          bucket = "test123456",
          prefix = "targets_test",
          endpoint = "https://js2.jetstream-cloud.org:8001"
        )
      )
    )
    list(
      tar_target(
        file,
        system.file("ex/elev.tif", package = "terra"),
        format = "file", 
        repository = "local"
      ),
      tar_terra_rast(
        rast_example,
        terra::rast(file)
      )
    )
  })
  tar_make()
  tar_load(rast_example)
  sources(rast_example)
  fs::file_exists(sources(rast_example))
})
#> ▶ dispatched target file
#> ● completed target file [0.003 seconds, 7.994 kilobytes]
#> ▶ dispatched target rast_example
#> ● completed target rast_example [0.008 seconds, 8.523 kilobytes]
#> ▶ ended pipeline [1.063 seconds]
#> /private/var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpVmmt9h/_targets/scratch#> /rast_example1674d43dc896a 
#> FALSE
Aariq commented 2 hours ago

This might not be something we can fix, but we should document it as a limitation and possibly open a discussion in the targets repo