java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 6

Hi,

I'm also facing the same issue. I've used shc to save the data into hbase. When reading from hbase, I tried debugging and the issue is coming up only on the columns with double values (positive and negative values). I'm able to iterate over the rest of the columns using show(). I checked the hbase data of the issue column, and it only has double values, nothing else. I'm using shc version "com.hortonworks" % "shc-core" % "1.1.1-2.1-s_2.11".

I've given a sample of the source code and hbase data below,

HBase Write,

**def customer_catalog = s"""{ |"table":{"namespace":"default", "name":"card_customer"}, |"rowkey":"cc_num", |"columns":{ |"cc_num":{"cf":"rowkey", "col":"cc_num", "type":"long"}, |"first":{"cf":"cust", "col":"first", "type":"string"}, |"last":{"cf":"cust", "col":"last", "type":"string"}, |"gender":{"cf":"cust", "col":"gender", "type":"string"}, |"street":{"cf":"cust", "col":"street", "type":"string"}, |"city":{"cf":"cust", "col":"city", "type":"string"}, |"state":{"cf":"cust", "col":"state", "type":"string"}, |"zip":{"cf":"cust", "col":"zip", "type":"int"}, |"lat":{"cf":"cust", "col":"lat", "type":"double"}, |"long":{"cf":"cust", "col":"long", "type":"double"}, |"job":{"cf":"cust", "col":"job", "type":"string"}, |"dob":{"cf":"cust", "col":"dob", "type":"string"} |} |}""".stripMargin

val customers_df = spark.read.format("csv").option("inferSchema", true).option("header", true).load(args(0))

customers_df.write.options(Map(HBaseTableCatalog.tableCatalog -> customer_catalog,HBaseTableCatalog.newTable -> "4")).format(defaultFormat).save()**

HBase Read,

**val customer_df = withCatalog(customer_catalog)

HBase Customer 'lat' column complete data,

column=cust:lat, timestamp=1589126119532, value=38.104 column=cust:lat, timestamp=1589126119532, value=39.0204 column=cust:lat, timestamp=1589126119532, value=42.7382 column=cust:lat, timestamp=1589126119532, value=43.6221 column=cust:lat, timestamp=1589126119532, value=43.0934 column=cust:lat, timestamp=1589126119532, value=42.291 column=cust:lat, timestamp=1589126119532, value=28.8652 column=cust:lat, timestamp=1589126119532, value=40.2841 column=cust:lat, timestamp=1589126119532, value=39.471 column=cust:lat, timestamp=1589126119532, value=35.0552 column=cust:lat, timestamp=1589126119532, value=32.4593 column=cust:lat, timestamp=1589126119532, value=37.4527 column=cust:lat, timestamp=1589126119532, value=38.1919 column=cust:lat, timestamp=1589126119532, value=40.9252 column=cust:lat, timestamp=1589126119532, value=41.6849 column=cust:lat, timestamp=1589126119532, value=41.8798 column=cust:lat, timestamp=1589126119532, value=37.8344 column=cust:lat, timestamp=1589126119532, value=41.0541 column=cust:lat, timestamp=1589126119532, value=39.9001 column=cust:lat, timestamp=1589126119532, value=40.6174 column=cust:lat, timestamp=1589126119532, value=40.0685 column=cust:lat, timestamp=1589126119532, value=41.8858 column=cust:lat, timestamp=1589126119532, value=40.9419 column=cust:lat, timestamp=1589126119532, value=45.9344 column=cust:lat, timestamp=1589126119532, value=41.9399 column=cust:lat, timestamp=1589126119532, value=47.2689 column=cust:lat, timestamp=1589126119532, value=41.6838 column=cust:lat, timestamp=1589126119532, value=40.1765 column=cust:lat, timestamp=1589126119532, value=36.9576 column=cust:lat, timestamp=1589126119532, value=41.5583 column=cust:lat, timestamp=1589126119532, value=42.1265 column=cust:lat, timestamp=1589126119532, value=42.1588 column=cust:lat, timestamp=1589126119532, value=34.8825 column=cust:lat, timestamp=1589126119532, value=38.6203 column=cust:lat, timestamp=1589126119532, value=41.7461 column=cust:lat, timestamp=1589126119532, value=34.156 column=cust:lat, timestamp=1589126119532, value=35.3039 column=cust:lat, timestamp=1589126119532, value=27.4295 column=cust:lat, timestamp=1589126119532, value=33.8011 column=cust:lat, timestamp=1589126119532, value=42.8135 column=cust:lat, timestamp=1589126119532, value=40.1879 column=cust:lat, timestamp=1589126119532, value=30.7766 column=cust:lat, timestamp=1589126119532, value=28.5163 column=cust:lat, timestamp=1589126119532, value=37.7917 column=cust:lat, timestamp=1589126119532, value=39.2502 column=cust:lat, timestamp=1589126119532, value=36.6876 column=cust:lat, timestamp=1589126119532, value=40.6321 column=cust:lat, timestamp=1589126119532, value=44.769 column=cust:lat, timestamp=1589126119532, value=44.9039 column=cust:lat, timestamp=1589126119532, value=44.9715

No issue when,

**customer_df.select("first","last","gender","street","city","state","zip","job","dob").show()

+---------+---------+------+--------------------+--------------+-----+---------+--------------------+-------------------+ | first| last|gender| street| city|state| zip| job| dob| +---------+---------+------+--------------------+--------------+-----+---------+--------------------+-------------------+ | Melissa| James| F| 537 Bryant Mall| Salt Lick| KY|875574071|Psychologist, for...|1956-07-19 14:30:00| | John| Holland| M|630 Christina Harbor| Zephyr Cove| NV|943273012|Geophysical data ...|1949-12-28 13:30:00| | James|Rodriguez| M| 95514 Andrew Street| Elk Point| SD|892809266| Chiropractor|1953-07-28 14:30:00| | Maurice| Simon| M|031 Jessica Harbo...| Caledonia| MN|892680498|Hydrographic surv...|1974-11-03 13:30:00| | Kevin| Martin| M|40514 Diana Expre...| Savannah| NY|825438516|Scientist, physio...|1973-07-07 14:30:00| | Debra| Davis| F| 566 Reed Well| Canton| MI|876097848|Teacher, special ...|1998-03-01 13:30:00| | John| Brown| M|79481 Potter Vill...| Francitas| TX|926366006|Engineer, civil (...|1973-08-13 14:30:00| | Monica| Brown| F| 39422 Chloe Court| Myers Flat| CA|959788341|Designer, fashion...|1948-01-13 13:30:00|**

Issue when adding lat or long column,

**customer_df.select("first","last","gender","street","city","state","zip","job","dob","lat").show()

java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 6 at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:631) at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:605) at org.apache.hadoop.hbase.util.Bytes.toDouble(Bytes.java:729) at org.apache.hadoop.hbase.util.Bytes.toDouble(Bytes.java:720) at org.apache.spark.sql.execution.datasources.hbase.types.PrimitiveType.fromBytes(PrimitiveType.scala:33) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$4.apply(HBaseTableScan.scala:107) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$4.apply(HBaseTableScan.scala:99) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.buildRow(HBaseTableScan.scala:99) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anon$3.next(HBaseTableScan.scala:189) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anon$3.next(HBaseTableScan.scala:170) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)**

Kindly suggest if anything is missed or wrong.

Thanks, Venkatesh Raman

hortonworks-spark / shc

java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 6 #334