Closed 704572066 closed 2 years ago
Hi, your code has not run into analytics zoo yet and it looks like TensorFlow is not set up to read HDFS.
Would you mind trying to follow the steps here and make sure TensorFlow can access HDFS? https://github.com/tensorflow/docs/blob/r1.11/site/en/deploy/hadoop.md
after env set it can download data,but I came across another problem:Py4JJavaError: An error occurred while calling o69.estimatorTrainMiniBatch. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 20, hadoop102, executor 2): org.tensorflow.TensorFlowException: hdfs:///user/root/mnist/3.0.0/mnist-train.tfrecord-00000-of-00001; Unknown error 255 Untitled5.md
@704572066 This may be because of HDFS configuration issues. Could you also try if your pyspark program can read the hdfs files (without using tensorflow, just using spark)? Here is also a link about troubleshooting common causes for "Unknown error 255". https://www.ibm.com/support/pages/datastage-bdfs-stage-gets-error-255-connecting-remote-hadoophdfs-server
In the meantime, we'll try to reproduce on our side.
@704572066 This may be because of HDFS configuration issues. Could you also try if your pyspark program can read the hdfs files (without using tensorflow, just using spark)? Here is also a link about troubleshooting common causes for "Unknown error 255". https://www.ibm.com/support/pages/datastage-bdfs-stage-gets-error-255-connecting-remote-hadoophdfs-server
In the meantime, we'll try to reproduce on our side.
Thank you, I have resolved the problem. its because of the libhdfs.so, I used to copy the file in the directory:“/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib64/” , then I tried the file in the directory:“/etc/hadoop/conf/lib/native” and it works!