Data volumes for persistence and connect to Hive

darrenhaken commented 7 years ago

I'm new to the Hadoop stack so forgive me if I'm missing something obvious.

I had two requirements I'm trying to work out with this Docker image.

1) how to persist hdfs to a data volume (is hdfs running?) 2) how to connect another container running another part of the Hadoop stack i.e Hive.

Can anyone help?

patrickneubauer commented 6 years ago

Dear darrenhaken,

we face similar requirements and wonder if you were able to resolve yours.

If so, could you please point us towards the direction or steps that you chose to configure the Hive container to use the hdfs running within this Docker image?

Cheers, Patrick

Mehran91z commented 4 years ago

Hi, I have same problem for using volume for my hdfs input/output.

I want to make directory by $HADOOP_PREFIX/bin/hadoop fs -mkdir mytest and then put files on mytest/input and do something on them like wordcount and I want to persist input and output data after each docker run!

How is it possible?

I made so far:

Added these codes to hdfs-site.xml: `

dfs.datanode.data.dir

<value>file:///home/app/hdfs/datanode</value>
<description>DataNode directory</description>

`

Create a docker volume with name 'myvol'
Use -v for run image: docker run -v myvol:/home/app -it c29b621ba74a /etc/bootstrap.sh -bash

But in /home/app directory there are just my created files by vi command and another folder named 'hdfs', that not working to persist input/output data.

sequenceiq / hadoop-docker

Data volumes for persistence and connect to Hive #68