locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.34k stars 361 forks source link

Spark 3 & Hadoop 3 Support #3218

Closed echeipesh closed 3 years ago

echeipesh commented 4 years ago

https://spark.apache.org/news/spark-3.0.0-preview.html

Looks like it might be an easy upgrade. At this point is a place-holder issue leading to pushing a SNAPSHOT release with Spark 3.0.0-preview2 dependency.

pomadchin commented 4 years ago

This task is unblocked by the EMR 6.1 release. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html

pomadchin commented 4 years ago

GeoTrellis upgrade doesn't seem to be very complicated and requires just a single dependency upgrade (as well as some other libraries upgrades). EMR 6.1 allows us to move towards some more fresh libraries and allows to drop the Scala 2.11 support.

However, the Vectorpipe project would require some effort to bump the dependency version. Things are getting more complicated taking into account that Vectorpipe depends on GeoTrellis and GeoMesa:

  1. GeoMesa needs to be upgraded up to 2.12 (we can help to work on the 2.11 / 2.12 crosscompilation for the GeoMesa project)
  2. GeoMesa still needs 2.11 which means that it can be hard to make a 2.12 with Spark 3 release (to make it possible to have the geotrellis-geomesa project), however it still can be possible.

=> Vectorpipe depends on GT & GM and requires:

  1. geomesa-spark-jts to be uptdated up to Spark 3 and Scala 2.12
  2. geotrellis-geomesa to be updated up to Spark 3 and Scala 2.12

RasterFrames depends on GeoTrellis and GM, also depends on Spark 2, and heavily uses its internal spark-sql logic which is more fragile and can even break across minor releases. RasterFrames also requires upgrading up to Scala 2.12 and Spark 3 (which in case of RasterFrames can be much more complicated rather than in the GM case) in case GeoTrellis would shift towards the Spark 3 support.

Small Conclusion

GeoTrellis itself is not that hard to upgrade. However, things are getting a bit more complicated with GeoMesa, RasterFrames and Vectorpipe. GM and RF projects are still interested in the Scala 2.11 support. To keep the locationtech ecosystem relatively in sync and up to date we need to maintain GM and RF both cross scala 2.12 / 2.11 builds and Spark 3 / 2 which can be non trivial.

cc @jnh5y @echeipesh @elahrvivaz @metasim