lynker-spatial / hfsubsetR

GNU General Public License v3.0
4 stars 2 forks source link

Error creating dataset from get_subset() #3

Open glitt13 opened 2 weeks ago

glitt13 commented 2 weeks ago

Recent updates to hfsubsetR now generate errors in unit tests that previously passed. Here's an example of a call that no longer works:

nldi_feat <- list(featureSource = "comid",featureID="1520007")

hfsubsetR::get_subset(nldi_feature = nldi_feat,
                    outfile ="./hydrofab_network__1520007.gpkg",
                        type = 'reference',lyrs ="network",
                        overwrite=TRUE
 )

Which generates the following:

Error in `arrow::open_dataset()`:
! IOError: Error creating dataset. Could not read schema from 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet': AWS Error ACCESS_DENIED during GetObject operation: Access Denied

Traceback here:

> rlang::last_trace()
<error/rlang_error>
Error in `arrow::open_dataset()`:
! IOError: Error creating dataset. Could not read schema from 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet': AWS Error ACCESS_DENIED during GetObject operation: Access Denied
---
Backtrace:
     ▆
  1. └─hfsubsetR::get_subset(...)
  2.   └─hfsubsetR::findOrigin(...)
  3.     ├─dplyr::slice_min(...)
  4.     ├─dplyr::collect(...)
  5.     ├─dplyr::distinct(...)
  6.     ├─dplyr::select(...)
  7.     ├─hfsubsetR:::findOriginQuery(.query, network)
  8.     ├─hfsubsetR:::findOriginQuery.nldi_feature(.query, network)
  9.     ├─base::NextMethod()
 10.     └─hfsubsetR:::findOriginQuery.comid(.query, network)
 11.       ├─dplyr::filter(arrow::open_dataset(network), hf_id == !!comid)
 12.       └─arrow::open_dataset(network)
Run rlang::last_trace(drop = FALSE) to see 6 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `arrow::open_dataset()`:
! IOError: Error creating dataset. Could not read schema from 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet': AWS Error ACCESS_DENIED during GetObject operation: Access Denied
---
Backtrace:
     ▆
  1. └─hfsubsetR::get_subset(...)
  2.   └─hfsubsetR::findOrigin(...)
  3.     ├─dplyr::slice_min(...)
  4.     ├─dplyr::collect(...)
  5.     ├─dplyr::distinct(...)
  6.     ├─dplyr::select(...)
  7.     ├─hfsubsetR:::findOriginQuery(.query, network)
  8.     ├─hfsubsetR:::findOriginQuery.nldi_feature(.query, network)
  9.     ├─base::NextMethod()
 10.     └─hfsubsetR:::findOriginQuery.comid(.query, network)
 11.       ├─dplyr::filter(arrow::open_dataset(network), hf_id == !!comid)
 12.       └─arrow::open_dataset(network)
 13.         └─base::tryCatch(...)
 14.           └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 15.             └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 16.               └─value[[3L]](cond)
 17.                 └─arrow:::augment_io_error_msg(e, call, format = format)
 18.                   └─rlang::abort(msg, call = call)
glitt13 commented 5 days ago

@mikejohnson51 This issue still persists with the latest changes in the main branch, except that the Error message has changed to the following:

Error: IOError: Path does not exist 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/'. Detail: [errno 2] No such file or directory

What's the anticipated resolution timeframe? Having an estimate will help me decide how to prioritize tasks. If you need a hand with anything, reach out and I'm happy to help dig further.

> traceback()
13: dataset___FileSystemDatasetFactory__Make(filesystem, selector, 
        format, fsf_options(factory_options, partitioning))
12: FileSystemDatasetFactory$create(path_and_fs$fs, selector, NULL, 
        format, partitioning, factory_options)
11: DatasetFactory$create(sources, partitioning = partitioning, format = format, 
        schema = schema, hive_style = hive_style, factory_options = factory_options, 
        ...)
10: arrow::open_dataset(network)
9: dplyr::filter(arrow::open_dataset(network), hf_id == !!comid)
8: findOriginQuery.comid(.query, network)
7: findOriginQuery(.query, network)
6: dplyr::select(findOriginQuery(.query, network), id, toid, vpuid, 
       topo, hydroseq)
5: dplyr::distinct(dplyr::select(findOriginQuery(.query, network), 
       id, toid, vpuid, topo, hydroseq))
4: dplyr::collect(dplyr::distinct(dplyr::select(findOriginQuery(.query, 
       network), id, toid, vpuid, topo, hydroseq)))
3: dplyr::slice_min(dplyr::collect(dplyr::distinct(dplyr::select(findOriginQuery(.query, 
       network), id, toid, vpuid, topo, hydroseq))), hydroseq, with_ties = TRUE)
2: findOrigin(network = glue("{hook}_network"), id = id, comid = comid, 
       hl_uri = hl_uri, poi_id = poi_id, nldi_feature = nldi_feature, 
       xy = xy)
1: hfsubsetR::get_subset(comid = comid, outfile = fp_cat, lyrs = lyrs, 
       overwrite = overwrite, type = "nextgen")
mikejohnson51 commented 5 days ago

Hey Guy,

If you check Lynker Spatial, you'll see v2.2 is not up in parquet form, so, the file you're aiming at truly doesn't exist. Once we're sure we've resolved the file corruption issue (root of this issue originally) we'll throw it all up there.

Thanks!

Mike