Closed shruthb closed 8 years ago
wait, I see Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException
this does not look like a SerDe problem but a kryo problem. What's the output of DESCRIBE EXTENDED interim_2 ?
hive> DESCRIBE EXTENDED interim_2;
OK
userid string
features array<string>
Detailed Table Information Table(tableName:interim_2, dbName:default, owner:hadoop, createTime:1463555677, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:string, comment:null), FieldSchema(name:features, type:array<string>, comment:null)], location:hdfs://ip-172-31-17-151.ec2.internal:8020/user/hive/warehouse/interim_2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, transient_lastDdlTime=1463555908, COLUMN_STATS_ACCURATE=true, totalSize=11347, numRows=30, rawDataSize=11317}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.056 seconds, Fetched: 4 row(s)
Also the exception occurs only on inserts, only
> select userId, tjson(features) from interim_2;
does not throw an exception
I'm using the json-udf-1.3.7-jar-with-dependencies.jar on emr 4.4.0 with hive 1.0.0 and hadoop Amazon 2.7.1
ok thanks, let me have a look
ok, I was able to reproduce the issue, looking into it.
So, I looked at it and the exception is not actually in the serde. It may be related to this bug https://issues.apache.org/jira/browse/HIVE-12206. I could reproduce it in hive 1.2, but that ticket says it's fixed in 1.3 and in fact I tried in hive 2.0 and the bug does not show up there. Closing as it's a hive bug, not a serde bug.
so i guess you should upgrade hive to 1.3 +
okay the current emr releases have only hive 1.0 and I'm not sure how to get 1.3+ on them, thanks for looking into it, I tried this instead though
import org.apache.hadoop.hive.ql.exec.UDF;
import org.codehaus.jackson.map.ObjectMapper;
import java.io.IOException;
/**
* Serialize object to string
*/
public class UDFSerialize extends UDF {
ObjectMapper mapper = new ObjectMapper();
public String evaluate(Object object) {
try {
return mapper.writeValueAsString(object);
} catch (IOException ex) {
throw new RuntimeException("error while serializing object", ex);
}
}
}
steps done on hive shell :
info about tables :
Error :