locationtech-labs / geopyspark

GeoTrellis for PySpark
Other
179 stars 59 forks source link

Remove dependence on GEOPYSPARK_JARS_PATH env var #688

Open jpolchlo opened 5 years ago

jpolchlo commented 5 years ago

At present, GPS relies on an environment var to tell it how to load the jar resources. This is unnecessary and prevents the loading of jar resources off maven or some other repo. This should be abandoned in favor of either using the --jars or --packages switch to pyspark, and let spark manage the dependencies on its own, according to the user's preferences. This would remove the need to manage an S3 repository of jars, and remove some fiddly code from the package init.

Connects #672 Connects #669

jpolchlo commented 5 years ago

Worth mentioning that this change would not prevent the usage of a fat jar (possibly still published on S3), but would simply mean that there would be some flexibility for the user to choose a fat jar (--jars switch), or a published version (--packages switch).

[In the latter case, the jai_core maven repo problems would require manually downloading that jar from a known good location using a --files switch, followed by an --exclude-packages javax.media:jai_core to make it work. But it does work.]