stackabletech / demos

This repo contains SDP stacks and demos
https://docs.stackable.tech/home/stable/demos/
Apache License 2.0
0 stars 2 forks source link

HBase Spark Demo #19

Open Jimvin opened 2 years ago

Jimvin commented 2 years ago

Loading data into HBase is not trivial. We want the demo to show how this can be done and to provide some guidance and best practice.

Aims

Tasks

 Learning Points and Challenges

snocke commented 2 years ago

Choose your JAVA version first. In october 2022 it only compiles and tests successfully with JAVA8. However, we depend on JAVA11 in our images.

mvn -Dspark.version=3.3.0 -Dscala.version=2.12.14 -Dhadoop-three.version=3.3.2 -Dscala.binary.version=2.12 -Dhbase.version=2.4.12 -DrecompileMode=all clean package

.jars can be found in Nexus

snocke commented 2 years ago

This shows how to access hbase using spark shell https://kontext.tech/article/628/spark-connect-to-hbase

snocke commented 2 years ago

Hi @Jimvin, in case you want to continue the hbase-spark-connector test during my holiday you will find the status quo on branch 87 in stackablectl

snocke commented 1 year ago

After updating the hbase connector repo my maven build fails:

[INFO] --- gmaven-plugin:1.5:execute (default) @ hbase-spark ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache HBase - Spark 1.0.1-SNAPSHOT:
[INFO]
[INFO] Apache HBase - Spark ............................... SUCCESS [  3.120 s]
[INFO] Apache HBase - Spark Protocol ...................... SUCCESS [  3.778 s]
[INFO] Apache HBase - Spark Protocol (Shaded) ............. SUCCESS [  1.922 s]
[INFO] Apache HBase - Spark Connector ..................... FAILURE [  4.405 s]
[INFO] Apache HBase - Spark Integration Tests ............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  13.905 s
[INFO] Finished at: 2022-09-27T22:34:54+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.gmaven:gmaven-plugin:1.5:execute (default) on project hbase-spark: Execution default of goal org.codehaus.gmaven:gmaven-plugin:1.5:execute failed: An API incompatibility was encountered while executing org.codehaus.gmaven:gmaven-plugin:1.5:execute: java.lang.ExceptionInInitializerError: null
[ERROR] -----------------------------------------------------
[ERROR] realm =    plugin>org.codehaus.gmaven:gmaven-plugin:1.5
[ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
[ERROR] urls[0] = file:/Users/Simon/.m2/repository/org/codehaus/gmaven/gmaven-plugin/1.5/gmaven-plugin-1.5.jar
[ERROR] urls[1] = file:/Users/Simon/.m2/repository/org/codehaus/gmaven/runtime/gmaven-runtime-api/1.5/gmaven-runtime-api-1.5.jar
[ERROR] urls[2] = file:/Users/Simon/.m2/repository/org/codehaus/gmaven/feature/gmaven-feature-api/1.5/gmaven-feature-api-1.5.jar
[ERROR] urls[3] = file:/Users/Simon/.m2/repository/org/codehaus/gmaven/runtime/gmaven-runtime-loader/1.5/gmaven-runtime-loader-1.5.jar
[ERROR] urls[4] = file:/Users/Simon/.m2/repository/org/codehaus/gmaven/feature/gmaven-feature-support/1.5/gmaven-feature-support-1.5.jar
[ERROR] urls[5] = file:/Users/Simon/.m2/repository/org/codehaus/gmaven/runtime/gmaven-runtime-support/1.5/gmaven-runtime-support-1.5.jar
[ERROR] urls[6] = file:/Users/Simon/.m2/repository/org/sonatype/gshell/gshell-io/2.4/gshell-io-2.4.jar
[ERROR] urls[7] = file:/Users/Simon/.m2/repository/org/codehaus/plexus/plexus-utils/3.0/plexus-utils-3.0.jar
[ERROR] urls[8] = file:/Users/Simon/.m2/repository/com/thoughtworks/qdox/qdox/1.12/qdox-1.12.jar
[ERROR] urls[9] = file:/Users/Simon/.m2/repository/org/apache/maven/shared/file-management/1.2.1/file-management-1.2.1.jar
[ERROR] urls[10] = file:/Users/Simon/.m2/repository/org/apache/maven/shared/maven-shared-io/1.1/maven-shared-io-1.1.jar
[ERROR] urls[11] = file:/Users/Simon/.m2/repository/org/apache/xbean/xbean-reflect/3.4/xbean-reflect-3.4.jar
[ERROR] urls[12] = file:/Users/Simon/.m2/repository/log4j/log4j/1.2.12/log4j-1.2.12.jar
[ERROR] urls[13] = file:/Users/Simon/.m2/repository/commons-logging/commons-logging-api/1.1/commons-logging-api-1.1.jar
[ERROR] urls[14] = file:/Users/Simon/.m2/repository/com/google/collections/google-collections/1.0/google-collections-1.0.jar
[ERROR] urls[15] = file:/Users/Simon/.m2/repository/org/apache/maven/reporting/maven-reporting-impl/2.0.4.1/maven-reporting-impl-2.0.4.1.jar
[ERROR] urls[16] = file:/Users/Simon/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.1/plexus-interpolation-1.1.jar
[ERROR] urls[17] = file:/Users/Simon/.m2/repository/commons-validator/commons-validator/1.2.0/commons-validator-1.2.0.jar
[ERROR] urls[18] = file:/Users/Simon/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar
[ERROR] urls[19] = file:/Users/Simon/.m2/repository/commons-digester/commons-digester/1.6/commons-digester-1.6.jar
[ERROR] urls[20] = file:/Users/Simon/.m2/repository/commons-logging/commons-logging/1.0.4/commons-logging-1.0.4.jar
[ERROR] urls[21] = file:/Users/Simon/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar
[ERROR] urls[22] = file:/Users/Simon/.m2/repository/xml-apis/xml-apis/1.0.b2/xml-apis-1.0.b2.jar
[ERROR] urls[23] = file:/Users/Simon/.m2/repository/org/apache/maven/doxia/doxia-core/1.0-alpha-10/doxia-core-1.0-alpha-10.jar
[ERROR] urls[24] = file:/Users/Simon/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.0-alpha-10/doxia-sink-api-1.0-alpha-10.jar
[ERROR] urls[25] = file:/Users/Simon/.m2/repository/org/apache/maven/reporting/maven-reporting-api/2.0.4/maven-reporting-api-2.0.4.jar
[ERROR] urls[26] = file:/Users/Simon/.m2/repository/org/apache/maven/doxia/doxia-site-renderer/1.0-alpha-10/doxia-site-renderer-1.0-alpha-10.jar
[ERROR] urls[27] = file:/Users/Simon/.m2/repository/org/codehaus/plexus/plexus-i18n/1.0-beta-7/plexus-i18n-1.0-beta-7.jar
[ERROR] urls[28] = file:/Users/Simon/.m2/repository/org/codehaus/plexus/plexus-velocity/1.1.7/plexus-velocity-1.1.7.jar
[ERROR] urls[29] = file:/Users/Simon/.m2/repository/org/apache/velocity/velocity/1.5/velocity-1.5.jar
[ERROR] urls[30] = file:/Users/Simon/.m2/repository/org/apache/maven/doxia/doxia-decoration-model/1.0-alpha-10/doxia-decoration-model-1.0-alpha-10.jar
[ERROR] urls[31] = file:/Users/Simon/.m2/repository/commons-collections/commons-collections/3.2/commons-collections-3.2.jar
[ERROR] urls[32] = file:/Users/Simon/.m2/repository/org/apache/maven/doxia/doxia-module-apt/1.0-alpha-10/doxia-module-apt-1.0-alpha-10.jar
[ERROR] urls[33] = file:/Users/Simon/.m2/repository/org/apache/maven/doxia/doxia-module-fml/1.0-alpha-10/doxia-module-fml-1.0-alpha-10.jar
[ERROR] urls[34] = file:/Users/Simon/.m2/repository/org/apache/maven/doxia/doxia-module-xdoc/1.0-alpha-10/doxia-module-xdoc-1.0-alpha-10.jar
[ERROR] urls[35] = file:/Users/Simon/.m2/repository/org/apache/maven/doxia/doxia-module-xhtml/1.0-alpha-10/doxia-module-xhtml-1.0-alpha-10.jar
[ERROR] urls[36] = file:/Users/Simon/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar
[ERROR] urls[37] = file:/Users/Simon/.m2/repository/org/sonatype/gossip/gossip/1.2/gossip-1.2.jar
[ERROR] Number of foreign imports: 1
[ERROR] import: Entry[import  from realm ClassRealm[project>org.apache.hbase.connectors:spark:1.0.1-SNAPSHOT, parent: ClassRealm[maven.api, parent: null]]]
snocke commented 1 year ago

When executing the spark-k8s application I'm currently receiving an error. The error looks like a system error (ARM/x86)

++ id -u
+ myuid=1000
++ id -g
+ mygid=0
+ set +e
++ getent passwd 1000
+ uidentry=stackable:x:1000:1000::/stackable:/bin/bash
+ set -e
+ '[' -z stackable:x:1000:1000::/stackable:/bin/bash ']'
+ '[' -z /usr/lib/jvm/jre-11 ']'
+ SPARK_CLASSPATH=':/stackable/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf::/stackable/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /stackable/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.1.38 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class tech.stackable.demo.spark local:////Users/Simon/Repo/stackable/stackablectl/demos/hbase-hdfs-load-cycling-data/sparkHbaseAccess/target/sparkHbaseAccess-1.0-SNAPSHOT.jar --hbaseSite /arguments/hbase-site.xml --tableName cycling-tripdata
qemu-x86_64: Could not open '/lib64/ld-linux-x86-64.so.2': No such file or directory
snocke commented 1 year ago

This ticket is on hold. We need a strategy to get the hbase-spark-connector working with JAVA 11. The current status is saved on this branch