opendatacube / datacube-ows

Open Data Cube Open Web Services
Other
69 stars 37 forks source link

WMS does not show raster data hosted on private AWS S3 #946

Closed sotosoul closed 12 months ago

sotosoul commented 1 year ago

Description

The WMS service of my datacube-ows instance does not deliver raster data hosted on private AWS S3.

Steps

I've set up a local dev/test instance of datacube-ows and run it with the flask approach. I've also configured all env vars as described in the documentation.

i.e.:

My ows_conf.py file contains the following lines:

"s3_url": "https://bucket-name",
"s3_bucket": "bucket-name",
"s3_aws_zone": "us-west-2"

The following part works fine, indicating that datacube is able to fetch S3 data:

from datacube import Datacube

dc = Datacube()

dss = dc.find_datasets(product='s2_l2a_10m_v1')

# returns the correct xarray Dataset just fine:
data = dc.load(  
    datasets=dss,
    latitude=(55.52, 55.7),
    longitude=(12.6, 12.75),
    output_crs="EPSG:3857",
    resolution=(-100, 100),
)

Environment

conda list returns the following versions:

datacube                  1.8.15             pyhd8ed1ab_0    conda-forge
datacube-ows              1.8.34                   pypi_0    pypi
whatnick commented 1 year ago

What error do you get in the stack ? There may be a baked in --no-sign-request somewhere. Have you tried to use OWS as a library in the same environment to render some layers and seen the results ?

Could you try "false" as the value for no-sign request and see the results. Reading the tests, looks like this is what is tested. We should improve the docs to clearly show the valid string values interpreted as true and false.

sotosoul commented 1 year ago

If I setLevel of logging to INFO, I can see that:

[2023-07-24 16:46:50,542] [INFO] S3 access configured with signed requests
[2023-07-24 16:46:50,543] [INFO] Establishing/renewing credentials
[2023-07-24 16:46:50,634] [INFO] Found credentials in environment variables.

indicating that there's no issue regarding absence of these env vars.

I suppose datacube-ows queries datacube, which returns xarrays. If this assumption holds, I could look into what's being returned and debug from there. What do you think?

Btw, changing to 'false' didn't have any effect...

whatnick commented 1 year ago

All of the read handling using OWS is done by datacube, best to capture some intermediate results there and check. It would be also good to hear from the community using authenticated S3 buckets with OWS.

valpesendorfer commented 1 year ago

We're running OWS with a private S3 bucket (disclaimer: we're running the fairly old version 1.8.18 of datacube-ows still)

In the config, the only S3 related item we have set is s3_aws_zone (so we don't specify name / url).

In the environment variables we set:

AWS_NO_SIGN_REQUEST=NO 
AWS_DEFAULT_REGION=eu-central-1

We don't set credentials as these are supplied by the attached role and acquired / refreshed by the datacube-ows app.

Can you confirm that

from datacube import Datacube

dc = Datacube()

dss = dc.find_datasets(product='s2_l2a_10m_v1')

# returns the correct xarray Dataset just fine:
data = dc.load(  
    datasets=dss,
    latitude=(55.52, 55.7),
    longitude=(12.6, 12.75),
    output_crs="EPSG:3857",
    resolution=(-100, 100),
)

is actually loading the data (it should) and not a dask array? Also, is that run from the same environment as OWS?

sotosoul commented 1 year ago

I confirm that the dc.load function I included returns a properly populated xarray.Dataset that I can visualize with matplotlib and see the actual raster data from the S3. And, yes, it's within the same env; the reason I included it is because I wanted to show exactly that, i.e., that the datacube can access my S3 rasters just fine. I'm not sure but it seems the problem lies between the datacube-core and the datacube-ows, if that makes sense...

valpesendorfer commented 1 year ago

I remember it can be tricky to sort out loading issues between these layers. If I'd had to guess, either somehow requests are set to unsigned, the credentials are somehow not picked up, or you are using a different set of credentials that don't have the required permissions.

You probably need to dig into the logs a bit, set to debug and so on, to see what's happening behind the scenes with botocore / rasterio etc.

You say that OWS does not deliver raster data. Perhaps you can share an error message or any other hint to why that is the case.

SpacemanPaul commented 1 year ago

Have you rerun datacube-ows-update --views; datacube-ows-update ?

OWS does not use the same search mechanism as core - it has a separate postgis index to the database that allows more accurate spatial searches, which must be maintained as per the documentation here.

SpacemanPaul commented 12 months ago

I'm assuming we can close this now.