Petastorm requires hadoop on client?

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.78k stars 285 forks source link

I am very new to database programming and I am not very sure on this area. I have a hdfs on 2 server nodes and I tried making a reader into a parquet table. Then on my AI machine I run the following code and I get an error. Does the AI server need to have hadoop installed as well?

cmd = 'hdfs://192.168.0.32:9000/TrainingData/34611012/2048_1536/tif/parquet/'
make_reader(cmd)
------------
Unable to populate a sensible HadoopConfiguration for namenode resolution!
Path of last environment var (None) tried [None]. Please set up your Hadoop and 
define environment variable HADOOP_HOME to point to your Hadoop installation path.

uber / petastorm

Petastorm requires hadoop on client? #607