locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

PySpark version requirements #576

Closed gmweaver closed 2 years ago

gmweaver commented 2 years ago

Is there a specific dependency on the currently pinned pyspark version or will this work with older versions (I assume latest release depends on at least 3.x.x)?

https://github.com/locationtech/rasterframes/blob/develop/pyrasterframes/src/main/python/setup.py#L143

If it does work with prior versions, would it be possible to update requirements to be >= as opposed to a specific version. This would be very helpful for using with automated build tools that depend on correct dependency resolution.

pomadchin commented 2 years ago

Hey @gmweaver, realistically I think 3.1.x versions only. Spark tends to break (spark-sql) bin compat between minor versions.

metasim commented 2 years ago

@gmweaver Yeh, unfortunately, if you want Spark 2.x, you should use RasterFrames < 0.10. Fortunately, there have been no major API changes between RasterFrames 0.9.x and 0.10.x, so your code should work against both.

gmweaver commented 2 years ago

Got it, thanks for the feedback!

For future releases, is it possible have requirements be flexible within minor versions (i.e. a release that works with both 3.0.0 and 3.0.1)? Also, is there a release for 3.0.x? These questions are more related to constraints within my own dev environment and the need to use a certain Spark version until we approve/test an upgrade, so I understand that this may not be high priority.

pomadchin commented 2 years ago

Hey @gmweaver it is theoretically possible but fairly complicates the release process. It is not only a tiny constraint on the Python side, it is also the spark jar dependency. In frameless we resolved it by releasing separate artifacts - so not only cross scala version releases but also cross spark.