Open christine-e-smit opened 4 years ago
This is more of a sat-stac issue, in fact it's this function: https://github.com/sat-utils/sat-stac/blob/master/satstac/utils.py#L56
It checks for AWS URLs and will try a signed URL if a regular request fails, but it does not check for s3 URLs.
The main reason is that I didn't want to add boto3 and all it's dependencies as a dependency to sat-stac.
I think going forward the right approach is to use PySTAC which has a nice and easy approach to supply custom upload and download functions, I've been using it in a project do exactly this - have s3 style URLs for the catalog.
@matthewhanson - does sat-stac accept python file objects in its open function? We could potentially leverage fsspec/s3fs to work around the catalog opening.
Recently running into this problem with outputs from Earthdata Harmony as well, as referenced in the recent mention from https://github.com/podaac/AGU-2020/issues/13. More details:
import intake
stac_root = 'https://harmony.earthdata.nasa.gov/stac/{jobID}/{item}'
stac_cat = intake.open_stac_catalog(stac_root.format(jobID=job,item=''),name='Harmony output')
display(list(stac_cat))item_0 = f'{job}_0'
item = stac_cat[item_0]
assets = list(item)
asset = item[assets[0]]
da = asset.to_dask()
This gives me the following error:
ValueError: open_local can only be used on a filesystem which has attribute local_file=True
Hi @asteiker - thanks for reporting and sharing the link to Earthdata Harmony, looks very neat! It's hard to tell from your example code what format the data is, is it possible to share the full link stac_root link?
Opening STAC assets with s3://
prefixes should work but the code ultimately called depends on the data format. I'm guessing this might be a netcdf dataset. The error you're getting is due to some incompatibilities between intake-xarray and fsspec which are being used behind the scenes for data loading (see https://github.com/intake/intake-xarray/pull/93), so as a workaround until new versions are released you might try installing them from the unreleased master versions:
pip install git+https://github.com/intake/intake-xarray.git@master
pip install git+https://github.com/intake/filesystem_spec.git@master
I am having the same issue:
STACError: http://diwata-missions.s3-website-us-east-1.amazonaws.com/Diwata-2/SMI/stac/catalog.json does not exist locally
eventhough it is http:// and not s3://
Is there a solution to use a STAC Catalog published into a private S3 bucket? How can we navigate the catalog followings links which refer to private resources?
I was recently part of a group trying to use intake-stac to bring some files into dask from s3. Unfortunately, the data in question was not public and neither were the catalog files. So I wanted to use s3-style URLs for everything. Unfortunately, when I tried the following:
I got the error:
It looks to me as though STAC thinks this is a file path rather than an S3 URL. Our time was short and I couldn't figure out if there was some other way to get STAC to take an S3 URL.
At the same time, we were hoping to put s3 URLs in our item catalog entries. E.g. -
This one may be more of a stretch. I don't know if the STAC spec has s3-style URLs in mind. My two minute evaluation of the item spec (https://github.com/radiantearth/stac-spec/blob/master/item-spec/json-schema/item.json) is inconclusive.