Setting --py-files flag and sending pyrasterframes_2.11-0.9.0-python.zip makes pyspark search through maven and takes a long time to solve dependencies #523
As pointed out in https://rasterframes.io/getting-started.html#using-pyspark-shell, to properly use pyspark shell in cluster mode, it is needed to send the file pyrasterframes_2.11-0.9.0-python.zip in the flag --py-files and set the dependencies properly. However, when this is done, it hangs and since there is no connectivity directly with maven, after ~1 hour it stops and shows a lot of unmet dependencies. I am trying to get these dependencies from maven directly and putting the files inside the spark client, because I don't know yet where to change the address of pySpark to find our local maven mirror (since it takes 1h to test and validate the change of address).
My question is:
Should this happen every time we try to set up a rasterframes application? Can we make this in such a way that it does not need to do this process every time pyspark is called? 1h to set pyspark is unfeasible in practical applications...
Hi,
As pointed out in https://rasterframes.io/getting-started.html#using-pyspark-shell, to properly use pyspark shell in cluster mode, it is needed to send the file pyrasterframes_2.11-0.9.0-python.zip in the flag --py-files and set the dependencies properly. However, when this is done, it hangs and since there is no connectivity directly with maven, after ~1 hour it stops and shows a lot of unmet dependencies. I am trying to get these dependencies from maven directly and putting the files inside the spark client, because I don't know yet where to change the address of pySpark to find our local maven mirror (since it takes 1h to test and validate the change of address).
My question is:
Should this happen every time we try to set up a rasterframes application? Can we make this in such a way that it does not need to do this process every time pyspark is called? 1h to set pyspark is unfeasible in practical applications...