Add Hive - Githubissues

piotr-kalanski commented 6 years ago

piotr-kalanski commented 6 years ago

piotr-kalanski commented 6 years ago

Option 1 - resign from Hive and instead use metastore created by Spark. Consider using the same directory and bind mount to easily add data form host machine: https://stackoverflow.com/questions/45819568/why-there-are-many-spark-warehouse-folders-got-created
Option 2 - use Hive https://hub.docker.com/r/bde2020/hive/ image and copy hive-site.xml to spark container
- Challenge is to make sure two containers have the same hive-site.xml
Option 3 - build custom Docker image with Spark and Hive to share the same hive-site.xml
- Drawback: quite a lot of work for dev and maintenance of image

piotr-kalanski / big-data-dev-environment-docker