Closed broxtronix closed 9 years ago
@broxtronix Thanks for the detailed report! First, a question: to get here, how did you launch Thunder? Did you use the thunder
executable (inside thunder/python/bin/
), or did you launch pyspark
and then import the relevant Thunder code?
It's true that PySpark requires independent imports to be available on the workers, but rather than install code on the workers, this can be handled within PySpark itself by shipping egg files (and this seems like the preferred strategy among developers). Currently, the executable does this for you automatically: it builds an egg file and ships it across the cluster via the ADD_FILE
environmental variable (setting this is equivalent to passing the egg to sc.addPyFile
).
I just confirmed that if I launched Thunder via the executable, the following runs fine (identical to yours, except for the location of the data):
rdd_vols = tsc.loadExample('fish-images').cache()
num_vols = rdd_vols.count()
rdd_series = rdd_vols.toSeries().cache()
num_series = rdd_series.count()
If I launch via pyspark
the above gives an error like yours, but it is fixed by first adding the egg using sc.addPyFile
. In other words, If I run this as soon as the shell starts,
sc.addPyFile('/root/thunder/python/dist/thunder_python-0.5.0_dev-py2.6.egg')
from thunder import ThunderContext
tsc = ThunderContext(sc)
Then the above works fine as well.
Does this help explain what you were seeing? If so, we can definitely improve the documentation to clarify some of this.
Aha! Your explanation is right on the money. I had been running ipython notebook
and then importing spark and thunder, and that explains why the egg files weren't automatically shipped to the slave nodes. I can confirm that this all works fine for me now if I run thunder
instead of ipython notebook
, or if I use
sc.addPyFile('/root/thunder/python/dist/thunder_python-0.5.0_dev-py2.6.egg')
from thunder import ThunderContext
tsc = ThunderContext(sc)
Looking at the documentation, I think it's actually pretty clear already that people should be running thunder
on the command line in order to set up their environment correctly. Perhaps the one tip we could squeeze in there is to mention that you can still run thunder from pyspark or vanilla ipython notebook as long as you run the addPyFile() method first!
Thanks for the quick response on this. Cleared up my issue right away! :)
I am running Thunder (master branch, updated today) on EC2, and I have been encountering a problem wherin my ec2 slave instances are having trouble importing and running Thunder code. I've included one simple example that triggers this behavior below.
Poking around a bit in the thunder/python/thunder/utils/ec2.py script, it looks like Thunder is installed on the master node, but not on the slaves nodes. Only '/root/thunder/python/thunder/utils/data/' gets mirrored over to the slave nodes.
I'm guessing that this is because Spark will often pickle and send over any Thunder code on the master python process to the slave processes, but this does not seem to be working correctly here. I had been under the impression that this pickling capability only worked for symbols that had been imported into the master node's python process, but not if the code called by one of those symbols then called an import command of its own. That is, if 'import' is called from within a pickled function on the slave nodes, it will attempt to find that library on the slave node's local python installation, which would explain why it is failing here since thunder is not installed locally on the slaves!
As a workaround for now, I have installed Thunder on the slave nodes using the following commands:
~/spark-ec2/copy-dir /root/thunder pssh -h /root/spark-ec2/slaves echo /root/thunder/python >> /usr/lib/python2.6/site-packages/paths.pth
(The second command ensures that thunder is in the PYTHONPATH for the slaves)
Let me know if I'm missing anything, or if there is any trick to getting the pickling / import to work with the code residing solely on the master node. Thanks!!
Test case
This produces the following error: