rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
734 stars 391 forks source link

org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable #56

Open xelllee opened 10 years ago

xelllee commented 10 years ago

Still have this issue with casting.

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:191) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:454) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1061) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1113) ... 14 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:191) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:454) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1061) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1113) ... 14 more

rcongiu commented 10 years ago

can you send me the table definition, the query and sample data ?

rcongiu commented 10 years ago

Actually, I think I found the issue. Looks like DoubleWritable is not using standard hadoop DoubleWritable but has a wrapper with the same name (!).

I created a feature branch: https://github.com/rcongiu/Hive-JSON-Serde/tree/feature/double-cast

can you try and see if it fixes it ?

xelllee commented 10 years ago

Sure, I will test, and keep you posted.

Sent from my iPhone

On Feb 26, 2014, at 6:28 PM, Roberto Congiu notifications@github.com wrote:

Actually, I think I found the issue. Looks like DoubleWritable is not using standard hadoop DoubleWritable but has a wrapper with the same name (!).

I created a feature branch: https://github.com/rcongiu/Hive-JSON-Serde/tree/feature/double-cast

can you try and see if it fixes it ?

— Reply to this email directly or view it on GitHub.

xelllee commented 10 years ago

Looks like it doesn't fix the problem. I am using CDH 4.4.0

To Reproduce:

create table doublecast (f1 double, f2 double); create table doublecast3 (f1 double, f2 double); alter table doublecast SET SERDE 'org.openx.data.jsonserde.JsonSerDe';

put {"f1":2.4,"f2":2.6} in doublecast

INSERT overwrite TABLE doublecast3 SELECT DISTINCT f1,f2 FROM doublecast;

You will see the casting error, I guess something related to DISTINCT!( group by), select without it will pass.

But #34 's fix , will not have this issue.

rcongiu commented 10 years ago

Did you compile it from the feature branch ? The fix is not on the develop nor master branch, you have to checkout the fix feature. I am asking because I tried your sequence, and it work for me.  I attached the jar I used, can you test it with that and let me know if it fixes it? R.

 

"Good judgment comes from experience.

Experience comes from bad judgment"

Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu

On Thursday, February 27, 2014 12:34 PM, xiao notifications@github.com wrote:

Looks like it doesn't fix the problem. I am using CDH 4.4.0

To Reproduce: create table doublecast (f1 double, f2 double); create table doublecast3 (f1 double, f2 double); alter table doublecast SET SERDE 'org.openx.data.jsonserde.JsonSerDe'; put {"f1":2.4,"f2":2.6} in doublecast INSERT overwrite TABLE doublecast3 SELECT DISTINCT f1,f2 FROM doublecast; You will see the casting error, I guess something related to DISTINCT!( group by), select without it will pass. But #34 's fix , will not have this issue. — Reply to this email directly or view it on GitHub.

xelllee commented 10 years ago

Where is the jar,sorry can't find it

by the way

git clone -b feature/double-cast https://github.com/rcongiu/Hive-JSON-Serde.git

this is the branch i compiled and double checked the change you made is there

-import org.apache.hadoop.io.DoubleWritable; +import org.apache.hadoop.hive.serde2.io.DoubleWritable;

anyway, could u point me where is the jar, i will test it again.

rcongiu commented 10 years ago

attached in the email! Sending it again.

 

"Good judgment comes from experience.

Experience comes from bad judgment"

Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu

On Thursday, February 27, 2014 5:52 PM, xiao notifications@github.com wrote:

Where is the jar,sorry can't find it

— Reply to this email directly or view it on GitHub.

rcongiu commented 10 years ago

Oh I see, you're looking through github and the attachment gets lost. I uploaded to http://www.congiu.net/json-serde-1.1.9.3-SNAPSHOT-jar-with-dependencies.jar

 

"Good judgment comes from experience.

Experience comes from bad judgment"

Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu

On , Roberto Congiu rcongiu@yahoo.com wrote:

attached in the email! Sending it again.

 

"Good judgment comes from experience.

Experience comes from bad judgment"

Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu

On Thursday, February 27, 2014 5:52 PM, xiao notifications@github.com wrote:

Where is the jar,sorry can't find it

— Reply to this email directly or view it on GitHub.

xelllee commented 10 years ago

No, and new cast error, did I do something wrong?

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:824) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:656) ... 9 more Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaLongObjectInspector.get(JavaLongObjectInspector.java:39) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:622) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:572) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:658) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:854) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:751) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:819)

rcongiu commented 10 years ago

mmm, this may be a bug in the UDAF evaluation, where it's trying to call merge on the object directly , bypassing the objectinspector.

ovlaere commented 9 years ago

I'm facing a similar issue, getting the java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable.

I am comparing double values from a table using JSON serde to other double values computed from percentile_approx, and even though both are double, there is a ClassCastException.

Reading the comment by @rcongiu about bypassing the objectinspector, I managed to get the computation to complete, by disabling the automapjoin (set hive.auto.convert.join=false;). In that case, everything works. So clearly, when optimizing for map-joins, things are being bypassed.

Hope this helps in identifying the cause of this Exception,

koboldunderlord commented 8 years ago

Hi-

I've figured out what's causing this problem, and it's related to how you're serializing primitives.

I've submitted a pull request that has solved this problem for me. Let me know if you have any questions.

-James

rcongiu commented 8 years ago

That is a sub-optimal fix though, as it will create a lot of extra objects, which will impact performance/garbage collection. So it does fix one particular bug, but then makes things slightly worse all the time.

koboldunderlord commented 8 years ago

Yeah, that's fair. I mostly was just digging through this after encountering an issue today caused by joins with this and wanted to throw together a quick fix & share what the problem actually seemed to be.