locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

Geometry and date filters don't work in pyrasterframes STAC API #590

Open ngulyaev opened 1 year ago

ngulyaev commented 1 year ago

Result of the same query using pystac-client:

stac = "https://earth-search.aws.element84.com/v0" 
collection = "sentinel-s2-l2a-cogs"
shape = geopandas.read_file("zip:///shape.zip")
polygon = shapely.geometry.box(*shape.to_crs(epsg=4326).total_bounds)
SentinelSTAC = Client.open("https://earth-search.aws.element84.com/v0")
result = SentinelSTAC.search(
    intersects=polygon,
    collections=["sentinel-s2-l2a-cogs"],
    datetime="2020-04-01/2020-04-10"
).matched()
print(result)
16

Using the pyrasterframes STAC API:

spark = create_rf_spark_session()
scenes = spark.read.stacapi(stac, {
    'collections': [collection],
    'intersects': polygon.__geo_interface__,
    'datetime': '2020-04-01/2020-04-10'
}).limit(100)
print(scenes.count())
100

I set the limit to 100 on purpose because it takes a long time to complete the request. Also I noticed that pystac-client is much faster than pyrasterframes STAC API

pomadchin commented 1 year ago

Hey @ngulyaev definitely looks like a bug.

I think that's been addressed in terms of https://github.com/azavea/stac4s/pull/496 and https://github.com/azavea/stac4s/pull/502 (see https://github.com/azavea/stac4s/issues/495 for the bug description). If you're on develop, it may be enough to bump stac4s dep up to 0.8.1. I forgot to create a PR with deps upgrade.

ngulyaev commented 1 year ago

@pomadchin Thanks a lot, I tried to override jars through the spark settings but unfortunately it didn't help :

jars = [find_pyrasterframes_assembly(), '/home/ngulyaev/.m2/repository/com/azavea/stac4s/client_2.12/0.8.1/client_2.12-0.8.1.jar']
spark = create_rf_spark_session(**{
    # 'spark.jars.packages': 'com.azavea.stac4s:client_2.12:0.8.1',
    # 'spark.jars.repositories': 'https://repository.mulesoft.org/nexus/content/repositories/public/'
    'spark.jars': ','.join(jars),
    'spark.driver.extraClassPath': 'client_2.12-0.8.1.jar',
    'spark.executor.extraClassPath': 'client_2.12-0.8.1.jar'
})

Maybe I'm doing it in wrong way? It seems that these filter parameters are simply ignored, it don't fail even if I pass a string object instead of geometry to 'intersects'

pomadchin commented 1 year ago

@ngulyaev yea I don't think it would work this way, it will still prefer the transitive dep over the extra I think. Publishing the local version based on the current dev is the only real way to check it.