hortonworks-spark / shc

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Apache License 2.0
552 stars 281 forks source link

SHC for Cloudera (CDH 5.8.4) #179

Open pdendek opened 7 years ago

pdendek commented 7 years ago

Dear all,

I found that it is possible to use your library with regular Hadoop&Spark dependencies, with HBase from CDH. Please consider amendments in the shc/core/pom.xml file to make it generally possible. Details are below.

As of CDH5.8.4 the hadoop version is 2.6.0, which is different than transitive dependencies from Phoenix-core. Now, I alleviate this by manually inserting dependencies to hadoop libraries (this can be probably done in some briefer way), namely:

hadoop.version=2.6.0
org.apache.hadoop:hadoop-annotations:jar:${hadoop.version}
org.apache.hadoop:hadoop-auth:jar:${hadoop.version}
org.apache.hadoop:hadoop-client:jar:${hadoop.version}
org.apache.hadoop:hadoop-common:jar:${hadoop.version}
org.apache.hadoop:hadoop-hdfs:jar:${hadoop.version}
org.apache.hadoop:hadoop-mapreduce-client-app:jar:${hadoop.version}
org.apache.hadoop:hadoop-mapreduce-client-common:jar:${hadoop.version}
org.apache.hadoop:hadoop-mapreduce-client-core:jar:${hadoop.version}
org.apache.hadoop:hadoop-mapreduce-client-hs:jar:${hadoop.version}
org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:${hadoop.version}
org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:${hadoop.version}
org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:${hadoop.version}
org.apache.hadoop:hadoop-minicluster:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-api:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-client:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-common:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-server-applicationhistoryservice:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-server-common:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-server-nodemanager:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-server-tests:jar:${hadoop.version}
org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:${hadoop.version}

with the final piece of com.fasterxml.jackson.core:jackson-annotations:2.6.5 the solution is ready to go. Any approach equivalent to this one is going to result is a working solution.

Should you have questions do not hesitate to contact me.

Kind regards.

eric-maynard commented 7 years ago

After adding your dependencies, what did your POM file look like? It would be helpful if you could attach it to a PR or a personal fork.