Closed samuelsayag closed 7 years ago
Hello,
I signals (I possibly err on this but it looks strange), after having done:
$ mkdir test $ cp shc-core-1.0.1-1.6-s_2.10.jar test $ cd test $ unzip shc-core-1.0.1-1.6-s_2.10.jar
I have the stange:
$ ls -lah total 556K drwxr-xr-x 6 spark hadoop 4.0K Jan 3 13:48 . drwxr-xr-x 3 spark spark 4.0K Jan 3 13:48 .. drwxr-xr-x 2 spark hadoop 4.0K Dec 13 09:53 index -rw-r--r-- 1 spark hadoop 16K Dec 13 09:53 index.html -rw-r--r-- 1 spark hadoop 6.0K Dec 13 09:53 index.js drwxr-xr-x 2 spark hadoop 4.0K Dec 13 09:53 lib drwxr-xr-x 2 spark hadoop 4.0K Dec 13 09:53 META-INF drwxr-xr-x 3 spark hadoop 4.0K Dec 13 09:53 org -rw-r--r-- 1 spark hadoop 3.5K Dec 13 09:53 package.html -rw-r--r-- 1 spark hadoop 504K Jan 3 13:48 shc-core-1.0.1-1.6-s_2.10.jar
an further ... ls -lah org/apache/spark/sql/execution/datasources/hbase/ AvroException.html HBaseConnectionKey.html RDDResources.html SchemaConverters$$SchemaType.html AvroSedes$.html HBaseFilter$.html ReferencedResource.html SchemaMap.html Bound.html HBaseRelation.html RegionResource.html Sedes.html BoundRange.html HBaseRelation$.html Resource.html SerializableConfiguration.html BoundRange$.html HBaseResources$.html RowKey.html SerializedTypedFilter.html BoundRanges.html HBaseTableCatalog.html ScanRange.html SparkHBaseConf$.html DoubleSedes.html HBaseTableCatalog$.html ScanRange$.html TableResource.html Field.html HRF.html ScanResource.html TypedFilter.html FilterType$.html HRF$.html SchemaConversionException.html TypedFilter$.html GetResource.html package.html SchemaConverters$.html Utils$.html
...without being a jar of doc it is quite stange...
=> This explain why I had to add by myself a self-compiled jar of the project of the tag v1.0.1-1.6 in the command line of the sparkshell because it could not find the class in its classpath.
I compiled the shc project by myself doing this: $ git clone https://github.com/hortonworks-spark/shc.git $ git checkout v1.0.1-1.6 $ mvn clean compile package -P scala-2.10 -DskipTests
=> Is it possible that this compilation gives a version of the jar the cause the error of my first post ? java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
Thanks for helping
Thanks, @samouille666. I am looking into it. It should not be "**$.html". I will check Hortonworks repo.
(1) Is is possible to insert a org.apache.spark.sql.Dataframe[org.apache.spark.sql.Row] using a shc and a catalog ?
=> Yes.
(2) Given my current catalog, is it suppose to work ?
=> Yes.
(3) Is it possible that this compilation gives a version of the jar the cause the error of my first post ?
java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;)
=> You may want to check the Spark version you were using. v1.0.1-1.6
of SHC is for Spark 1.6.*.
Hello,
Many thanks for your answer. I am using spark 1.6.2 (using HDP 2.5 I do the export SPARK_MAJOR_VERSION=1, and my log display SPARK_MAJOR_VERSION is set to 1, using Spark). This is what I receive in the console: [spark@cluster1-node10 ~]$ export SPARK_MAJOR_VERSION=1 [spark@cluster1-node10 ~]$ spark-shell --version SPARK_MAJOR_VERSION is set to 1, using Spark Welcome to
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
// ./_,// //_\ version 1.6.2 /_/
Type --help for more information.
But a search on Internet reveals that the IntRef.create method changes between 2.10 and 2.11 scala version. As can you confirm that: $ git checkout v1.0.1-1.6 $ mvn clean compile package -P scala-2.10 -DskipTests is the correct way to compile against scala 2.10.5 ?
Many thanks
Yes, it is correct, but you can just simply use: mvn clean -Pscala-2.10 -DskipTests package
.
The jars in Hortonworks repo (http://repo.hortonworks.com/content/groups/public/com/hortonworks/) work well now, you can use them directly.
Hello,
The command line given are from my sparkshell: spark-shell --master yarn \
--deploy-mode client \
--name "hive2hbase" \
--repositories "http://repo.hortonworks.com/content/groups/public/" \
--packages "com.hortonworks:shc:1.0.1-1.6-s_2.10" \
--jars "shc-core-1.0.1-1.6-s_2.10.jar" --files "/usr/hdp/current/hive-client/conf/hive-site.xml" \
--driver-memory 1G \
--executor-memory 1500m \
--num-executors 6 2> ./spark-shell.log
I have a simple Dataframe of Row of count 5:
scala> newDf
res5: org.apache.spark.sql.DataFrame = [offer_id: int, offer_label: string, universe: string, category: string, sub_category: string, sub_label: string]
That is made of type Row
scala> newDf.take(1)
res6: Array[org.apache.spark.sql.Row] = Array([28896458,Etui de protection bleu pour li...liseuse Cybook Muse Light liseuse Cybook Muse Light liseuse Cybook Muse HD Etui de protection bleu pour lis... Etui de protection noir pour lis... Etui de protection rose pour lis... Etui de protection orange liseus...,null,null,null,null])
I try to insert this with the following catalog:
scala> cat res0: String = { "table":{"namespace":"default", "name":"offDen3m"}, "rowkey":"key", "columns":{ "offer_id":{"cf":"rowkey", "col":"key", "type":"int"}, "offer_label":{"cf":"cf1", "col":"col1", "type":"string"}, "universe":{"cf":"cf2", "col":"col2", "type":"string"}, "category":{"cf":"cf3", "col":"col3", "type":"string"}, "sub_category":{"cf":"cf4", "col":"col4", "type":"string"}, "sub_label":{"cf":"cf5", "col":"col5", "type":"string"} } }
When I try to insert with the following code:
newDf.write.options( Map(HBaseTableCatalog.tableCatalog -> cat, HBaseTableCatalog.newTable -> "5")) .format("org.apache.spark.sql.execution.datasources.hbase") .save()
And I obtain the following stack:
17/01/03 10:36:42 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 149.202.161.158:37691 in memory (size: 6.4 KB, free: 511.1 MB) java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;(HBaseTableCatalog.scala:152)(HBaseRelation.scala:163)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:142)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
My question is double:
Thank you very much for helping