Open xelllee opened 10 years ago
can you send me the table definition, the query and sample data ?
Actually, I think I found the issue. Looks like DoubleWritable is not using standard hadoop DoubleWritable but has a wrapper with the same name (!).
I created a feature branch: https://github.com/rcongiu/Hive-JSON-Serde/tree/feature/double-cast
can you try and see if it fixes it ?
Sure, I will test, and keep you posted.
Sent from my iPhone
On Feb 26, 2014, at 6:28 PM, Roberto Congiu notifications@github.com wrote:
Actually, I think I found the issue. Looks like DoubleWritable is not using standard hadoop DoubleWritable but has a wrapper with the same name (!).
I created a feature branch: https://github.com/rcongiu/Hive-JSON-Serde/tree/feature/double-cast
can you try and see if it fixes it ?
— Reply to this email directly or view it on GitHub.
Looks like it doesn't fix the problem. I am using CDH 4.4.0
To Reproduce:
create table doublecast (f1 double, f2 double); create table doublecast3 (f1 double, f2 double); alter table doublecast SET SERDE 'org.openx.data.jsonserde.JsonSerDe';
put {"f1":2.4,"f2":2.6} in doublecast
INSERT overwrite TABLE doublecast3 SELECT DISTINCT f1,f2 FROM doublecast;
You will see the casting error, I guess something related to DISTINCT!( group by), select without it will pass.
But #34 's fix , will not have this issue.
Did you compile it from the feature branch ? The fix is not on the develop nor master branch, you have to checkout the fix feature. I am asking because I tried your sequence, and it work for me. I attached the jar I used, can you test it with that and let me know if it fixes it? R.
"Good judgment comes from experience.
Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu
On Thursday, February 27, 2014 12:34 PM, xiao notifications@github.com wrote:
Looks like it doesn't fix the problem. I am using CDH 4.4.0
To Reproduce: create table doublecast (f1 double, f2 double); create table doublecast3 (f1 double, f2 double); alter table doublecast SET SERDE 'org.openx.data.jsonserde.JsonSerDe'; put {"f1":2.4,"f2":2.6} in doublecast INSERT overwrite TABLE doublecast3 SELECT DISTINCT f1,f2 FROM doublecast; You will see the casting error, I guess something related to DISTINCT!( group by), select without it will pass. But #34 's fix , will not have this issue. — Reply to this email directly or view it on GitHub.
Where is the jar,sorry can't find it
by the way
git clone -b feature/double-cast https://github.com/rcongiu/Hive-JSON-Serde.git
this is the branch i compiled and double checked the change you made is there
-import org.apache.hadoop.io.DoubleWritable; +import org.apache.hadoop.hive.serde2.io.DoubleWritable;
anyway, could u point me where is the jar, i will test it again.
attached in the email! Sending it again.
"Good judgment comes from experience.
Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu
On Thursday, February 27, 2014 5:52 PM, xiao notifications@github.com wrote:
Where is the jar,sorry can't find it
— Reply to this email directly or view it on GitHub.
Oh I see, you're looking through github and the attachment gets lost. I uploaded to http://www.congiu.net/json-serde-1.1.9.3-SNAPSHOT-jar-with-dependencies.jar
"Good judgment comes from experience.
Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu
On , Roberto Congiu rcongiu@yahoo.com wrote:
attached in the email! Sending it again.
"Good judgment comes from experience.
Experience comes from bad judgment"
Data Engineer - OpenX.org Pasadena, CA Skype: sardodazione Y! IM: rcongiu
On Thursday, February 27, 2014 5:52 PM, xiao notifications@github.com wrote:
Where is the jar,sorry can't find it
— Reply to this email directly or view it on GitHub.
No, and new cast error, did I do something wrong?
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:824) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:656) ... 9 more Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaLongObjectInspector.get(JavaLongObjectInspector.java:39) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:622) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:572) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:658) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:854) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:751) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:819)
mmm, this may be a bug in the UDAF evaluation, where it's trying to call merge on the object directly , bypassing the objectinspector.
I'm facing a similar issue, getting the java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable
.
I am comparing double values from a table using JSON serde to other double values computed from percentile_approx
, and even though both are double, there is a ClassCastException.
Reading the comment by @rcongiu about bypassing the objectinspector, I managed to get the computation to complete, by disabling the automapjoin (set hive.auto.convert.join=false;
). In that case, everything works. So clearly, when optimizing for map-joins, things are being bypassed.
Hope this helps in identifying the cause of this Exception,
Hi-
I've figured out what's causing this problem, and it's related to how you're serializing primitives.
I've submitted a pull request that has solved this problem for me. Let me know if you have any questions.
-James
That is a sub-optimal fix though, as it will create a lot of extra objects, which will impact performance/garbage collection. So it does fix one particular bug, but then makes things slightly worse all the time.
Yeah, that's fair. I mostly was just digging through this after encountering an issue today caused by joins with this and wanted to throw together a quick fix & share what the problem actually seemed to be.
Still have this issue with casting.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:191) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:454) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1061) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1113) ... 14 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:191) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:454) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1061) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1113) ... 14 more