locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

Setting --py-files flag and sending pyrasterframes_2.11-0.9.0-python.zip makes pyspark search through maven and takes a long time to solve dependencies #523

Closed yurigba closed 3 years ago

yurigba commented 3 years ago

Hi,

As pointed out in https://rasterframes.io/getting-started.html#using-pyspark-shell, to properly use pyspark shell in cluster mode, it is needed to send the file pyrasterframes_2.11-0.9.0-python.zip in the flag --py-files and set the dependencies properly. However, when this is done, it hangs and since there is no connectivity directly with maven, after ~1 hour it stops and shows a lot of unmet dependencies. I am trying to get these dependencies from maven directly and putting the files inside the spark client, because I don't know yet where to change the address of pySpark to find our local maven mirror (since it takes 1h to test and validate the change of address).

My question is:

Should this happen every time we try to set up a rasterframes application? Can we make this in such a way that it does not need to do this process every time pyspark is called? 1h to set pyspark is unfeasible in practical applications...

yurigba commented 3 years ago

Succeeded in setting maven repo, installing dependencies, will give feedback when finished.

yurigba commented 3 years ago

Closing issue because it was a problem with maven.