T31 upgrade pyspark Fixes #31

Instructions:

On the primary node's VM, do the following:

docker rmi ts_worker ts_spark-master
docker-compose build --no-cache

On the secondary VM, do the following:

docker rmi ts_spark-worker
docker-compose build --no-cache

Then do docker-compose up -d on both.

Please test loading with the spark-loader:

docker-compose -f loader.docker-compose.yml run --rm loader /bin/bash

spark-submit \
 --jars elasticsearch-hadoop.jar \
 --master spark://$SPARK_MASTER_HOST:7101 \
 --py-files dist/TweetSets-2.0-py3.6.egg,dependencies.zip \
 --conf spark.driver.bindAddress=0.0.0.0 \
 --conf spark.driver.host=$SPARK_DRIVER_HOST \
 tweetset_loader.py spark-create /dataset/path/to/files

Note: I had to comment out the image line and uncomment the build instructions in loader.docker-compose.yml. Those changes are part of this commit, but we'll want to revert back once the image is updated in Docker.

gwu-libraries / TweetSets

T31 upgrade pyspark Fixes #31 #68