rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
733 stars 393 forks source link

org.openx.data.udf.JsonUDF throwing exception #146

Closed shruthb closed 8 years ago

shruthb commented 8 years ago

steps done on hive shell :

> add jar json-udf-1.3.8-jar-with-dependencies.jar;
> create temporary function tjson as 'org.openx.data.udf.JsonUDF';
> insert overwrite table interim_3 select userId, tjson(features) from interim_2;

info about tables :

> describe interim_2
OK
userid                  string                                      
features                array<string>                               
Time taken: 0.065 seconds, Fetched: 2 row(s)
> describe interim_3
OK
userid                  string                                      
features               string                               
Time taken: 0.065 seconds, Fetched: 2 row(s)

Error :

Query ID = hadoop_20160518073535_c81311a0-39ec-4c5e-bee3-c0574d6e68fe
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1463552240367_0007, Tracking URL = http://ip-172-31-17-151.ec2.internal:20888/proxy/application_1463552240367_0007/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1463552240367_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-05-18 07:35:48,819 Stage-1 map = 0%,  reduce = 0%
2016-05-18 07:36:11,640 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1463552240367_0007 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1463552240367_0007_m_000000 (and more) from job job_1463552240367_0007

Task with the most failures(4): 
-----
Task ID:
  task_1463552240367_0007_m_000000

URL:
  http://ip-172-31-17-151.ec2.internal:8088/taskdetails.jsp?jobid=job_1463552240367_0007&tipid=task_1463552240367_0007_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
columnNames (org.openx.data.jsonserde.JsonSerDe)
serde (org.openx.data.udf.JsonUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:426)
        at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:289)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:261)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:489)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:482)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:658)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:170)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
columnNames (org.openx.data.jsonserde.JsonSerDe)
serde (org.openx.data.udf.JsonUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1028)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:936)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:950)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:393)
        ... 13 more
Caused by: java.lang.NullPointerException
        at java.util.Arrays$ArrayList.size(Arrays.java:2847)
        at java.util.AbstractList.add(AbstractList.java:108)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:105)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        ... 43 more
rcongiu commented 8 years ago

wait, I see Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException

this does not look like a SerDe problem but a kryo problem. What's the output of DESCRIBE EXTENDED interim_2 ?

shruthb commented 8 years ago
hive> DESCRIBE EXTENDED interim_2;
OK
userid                  string                                      
features                array<string>                               

Detailed Table Information      Table(tableName:interim_2, dbName:default, owner:hadoop, createTime:1463555677, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:string, comment:null), FieldSchema(name:features, type:array<string>, comment:null)], location:hdfs://ip-172-31-17-151.ec2.internal:8020/user/hive/warehouse/interim_2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, transient_lastDdlTime=1463555908, COLUMN_STATS_ACCURATE=true, totalSize=11347, numRows=30, rawDataSize=11317}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.056 seconds, Fetched: 4 row(s)

Also the exception occurs only on inserts, only > select userId, tjson(features) from interim_2; does not throw an exception

shruthb commented 8 years ago

I'm using the json-udf-1.3.7-jar-with-dependencies.jar on emr 4.4.0 with hive 1.0.0 and hadoop Amazon 2.7.1

rcongiu commented 8 years ago

ok thanks, let me have a look

rcongiu commented 8 years ago

ok, I was able to reproduce the issue, looking into it.

rcongiu commented 8 years ago

So, I looked at it and the exception is not actually in the serde. It may be related to this bug https://issues.apache.org/jira/browse/HIVE-12206. I could reproduce it in hive 1.2, but that ticket says it's fixed in 1.3 and in fact I tried in hive 2.0 and the bug does not show up there. Closing as it's a hive bug, not a serde bug.

rcongiu commented 8 years ago

so i guess you should upgrade hive to 1.3 +

shruthb commented 8 years ago

okay the current emr releases have only hive 1.0 and I'm not sure how to get 1.3+ on them, thanks for looking into it, I tried this instead though

import org.apache.hadoop.hive.ql.exec.UDF;
import org.codehaus.jackson.map.ObjectMapper;

import java.io.IOException;

/**
 * Serialize object to string
 */
public class UDFSerialize extends UDF {

    ObjectMapper mapper = new ObjectMapper();

    public String evaluate(Object object) {
        try {
            return mapper.writeValueAsString(object);
        } catch (IOException ex) {
            throw new RuntimeException("error while serializing object", ex);
        }
    }
}