Fail when using sequence file

obiwan866 commented 10 years ago

Hi, First i would like to thank you for your great job. Here is my problem : i'm using hive 0.12 on cdh 5.0 and i'm trying to create a table from a sequence file containing json.

CREATE EXTERNAL TABLE Test_Json ( url STRING, Ts TIMESTAMP, SESSIONID STRING, PARAMS STRING, CONTEXT STRUCT <USERNAME : STRING, LASTNAME : STRING, FORENAME : STRING, LINE : STRING, STREET : STRING, CITY : STRING, DPT : STRING, REGION : STRING> ) COMMENT 'foo' PARTITIONED BY (date STRING) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS SEQUENCEFILE LOCATION '/data/foo';

The i add a partition and when i'm trying to watch it i get this error : java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text.

If i get my data as text file there is no issue, and i'm able to select all the fields i want (especially in the struct field).

I wonder if i'm doing something bad ?

Thanks in advance.

rcongiu commented 10 years ago

Hi, most people use the serde with textfile, since usually it's to ingest data from/to external entities that usually do not know sequence files. For this reason, I think I've never tested it with sequence files. However, it should still work if the sequencefile was created correctly. Can you send me a sample sequencefile you're using ? I'll have a look and test it.

obiwan866 commented 10 years ago

Hi Roberto, Thank you for your quick answer. I join a test file. I hope it will be useful.

Have a good day.

2014-11-04 20:59 GMT+01:00 Roberto Congiu notifications@github.com:

Hi, most people use the serde with textfile, since usually it's to ingest data from/to external entities that usually do not know sequence files. For this reason, I think I've never tested it with sequence files. However, it should still work if the sequencefile was created correctly. Can you send me a sample sequencefile you're using ? I'll have a look and test it.

— Reply to this email directly or view it on GitHub https://github.com/rcongiu/Hive-JSON-Serde/issues/94#issuecomment-61704049 .

Frédéric

Roguelazer commented 8 years ago

Was this ever resolved? I'm having the same issue.

danfranks commented 8 years ago

I also am having the same issue. I can send the json string or the file. Just let me know where.

rcongiu commented 8 years ago

can you email to rcongiu@yahoo.com ?

obiwan866 commented 8 years ago

Hi, if there is any improvment about this serde, i would be interested in !

Have a good day.

2016-05-12 15:56 GMT+02:00 Roberto Congiu notifications@github.com:

can you email to rcongiu@yahoo.com ?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/rcongiu/Hive-JSON-Serde/issues/94#issuecomment-218764578

Frédéric

ChrisPortman commented 8 years ago

Im getting the exact same thing using a sequence file.

ChrisPortman commented 8 years ago

I got past this with

PARTITIONED BY ( date string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'hdfs://nameservice1/mydata'

ChrisPortman commented 8 years ago

So it seems that SequenceFileAsTextInputFormat avoids the exception, but I get nothing but nulls. I suspect that the string is some sort of byte string and not the JSON

RameshByndoor commented 6 years ago

Thanks for the library. Have encountered same issue with Presto for BytesWritable as Sequencefile value. Hope this PR help fix the same. Let me know incase any.

rcongiu / Hive-JSON-Serde

Fail when using sequence file #94