Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
I am very new to database programming and I am not very sure on this area.
I have a hdfs on 2 server nodes and I tried making a reader into a parquet table.
Then on my AI machine I run the following code and I get an error.
Does the AI server need to have hadoop installed as well?
cmd = 'hdfs://192.168.0.32:9000/TrainingData/34611012/2048_1536/tif/parquet/'
make_reader(cmd)
------------
Unable to populate a sensible HadoopConfiguration for namenode resolution!
Path of last environment var (None) tried [None]. Please set up your Hadoop and
define environment variable HADOOP_HOME to point to your Hadoop installation path.
I am very new to database programming and I am not very sure on this area. I have a hdfs on 2 server nodes and I tried making a reader into a parquet table. Then on my AI machine I run the following code and I get an error. Does the AI server need to have hadoop installed as well?