locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

Document use of java config params #537

Open vpipkt opened 3 years ago

vpipkt commented 3 years ago

Accessing S3 buckets, even with a public bucket, requires passing some Java options along. We also can use java options to choose whether to prefer the GDAL reader and other things.

Here is a quick example of using unsigned requests. Have to pass AWS_NO_SIGN_REQUEST in so that geotrellis.raster.gdal.option configuration is set.

Should add a dicsucssion about this generally to docs pages.

import pyrasterframes
from pyrasterframes.utils import create_rf_spark_session

spark = create_rf_spark_session(**{'spark.driver.extraJavaOptions': '-Dgeotrellis.raster.gdal.option.AWS_NO_SIGN_REQUEST=YES'})

df = spark.read.raster('s3://s22s-test-geotiffs/luray_snp/B11.jp2')
df.count()
vpipkt commented 3 years ago

FWIW the exact option given there does not help us do anonymous reads. To do that: os.environ['AWS_NO_SIGN_REQUEST'] = 'YES' before creating the spark session.

JenniferYingyiWu2020 commented 3 years ago

Hi @vpipkt , I tried to read 's3://s22s-test-geotiffs/luray_snp/B11.tif' under RasterFrames environment, and the before issue has been resolved. However, if I read 's3://s22s-test-geotiffs/luray_snp/B11.jp2', after 'df.count()' was been executed, the error "CPLE_OpenFailed(4) "Open failed." Unable to open EPSG support file gcs.csv. Try setting the GDAL_DATA environment variable to point to the directory containing EPSG csv files." has been taken place. 1 2 3

vpipkt commented 3 years ago

@JenniferYingyiWu2020 in the interest of keeping this issue focused, lets try to resolve this in Gitter if possible. It seem that the CPLE_OpenFailed(4) is a common GDAL configuration problem, not specific to RasterFrames itself.