Open wilsonpenha opened 1 year ago
Failed to get file system for path: hdfs://hadoopcluster/sin/ers/warehouse/tablespace/external/hive/hive_data.db/T_ERS_EVENT_PERF/metadata/00003-f70dd253-791a-499e-9ebd-7a739a461960.metadata.json
Since you are using HDFS file system. You can check if any hadoop conf required to be set. You can use --source-catalog-hadoop-conf
CLI option.
Well this is my first time using it so I don't know what to provide for this property and also the both HADOOP_CONF_DIR and HIVE_CONF_DIR are set so please provide what should I set?
Pasting the solution from Zuliip chat discussions
adding this line to cli/build.gradle.kts implementation(libs.hadoop.hdfs)
changing hadoop version at libs.version.toml hadoop = "3.3.1" # this is in mapping with iceberg repo. then running the follow command export HADOOP_CONF_DIR=/usr/lib/spark/conf export HIVE_CONF_DIR=/usr/lib/spark/conf
building a new jar (iceberg-catalog-migrator-cli-0.2.1-SNAPSHOT.jar)
java -Djavax.net.ssl.trustStore=/etc/security/clientKeys/client-truststore.jks \
-Djavax.net.ssl.trustStorePassword=admin1234 \
-Dhadoop.configuration.addResources=$HADOOP_CONF_DIR/core-site.xml \
-Dhadoop.configuration.addResources=$HADOOP_CONF_DIR/hdfs-site.xml \
-Dhadoop.configuration.addResources=$HADOOP_CONF_DIR/hive-site.xml \
-jar iceberg-catalog-migrator-cli-0.2.1-SNAPSHOT.jar \
register \
--source-catalog-type HIVE \
--source-catalog-properties warehouse=hdfs://hadoopcluster/sin/ers/warehouse/tablespace/external/hive,uri=thrift://hadoopoozie1:9083 \
--identifiers hive_data.t_ers_event_perf,hive_data.T_KWH_MATCH_RECORD_PERF \
--target-catalog-type NESSIE \
--target-catalog-properties uri=http://localhost:19120/api/v1,ref=main,warehouse=hdfs://hadoopcluster/sin/ers/warehouse/tablespace/external/hive
Hey I forgot to mention the solution here which would means change the code base here by someone or you guys may want to test it more?
I think we can test it more with manually supplement the jar in class path and use hadoop 2.7.3
The problem with Hadoop 2.7.3 is that it uses sun.nio.ch.DirectBuffer.cleaner() from java1.8 which was removed from java11 causing exception At final stage of FileInputStream, I looked into the java code and tested that it won't work with java11 you can see the stacktrace at zullichat another thing is we use hadoop3.3.1 so we could try Hadoop3.0.0 anyway that could be different version like one for Hadoop2 and another for Hadoop3 like spark what do you think?
Copying from zullichat: Awesome :+1: We can have a PR to add hadoop-hdfs dependency (or user can manually add the jar to class path also) and I am not sure about changing hadoop version to 3.3.1, because Iceberg expects to work with hadoop 2.7.3 and thats why Iceberg repo also keeps that version.
Using Hadoop-2.7.3 has this implementation which requires java-1.8 runtime as this sun.nio.ch.DirectBuffer was removed after Java-1.9, see full stracktrace above: Exception in thread "main" java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()' at org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
My hadoop env is 3.3.1 my hive env is 3.1.0 my iceberg is 1.2.1 my spark-3.2.1 my nessie-0.59.0 server
same error over and over