---------- Forwarded message ----------
From: Russell Jurney russell.jurney@gmail.com
Date: Fri, Jun 22, 2012 at 4:05 PM
Subject: Weird problem in Pig 0.10 with STOR'ing JSON and then LOADing it as PigStorage chararray
To: user@pig.apache.org
The script that has worked in the past is thus:
/* Load Avro emails */
emails = load '/me/tmp/emails_big' using AvroStorage();
emails = filter emails by message_id IS NOT NULL;
/* JSONify the emails for ElasticSearch */
store emails into '/tmp/emails.json' using JsonStorage();
/* LOAD JSON as single field for storage in ElasticSearch with Wonderpig */
json_emails = load '/tmp/emails.json' using PigStorage() AS (json_record:chararray);
store json_emails into 'es://email/email?id=message_id&json=true&size=1000' using ElasticSearch();
Now I get this error:
grunt> json_emails = load '/tmp/emails.json' AS (json_record:chararray);
2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema: left is "json_record:chararray", right is "message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}"
2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "json_record:chararray", right is "message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}"
at org.apache.pig.newplan.logical.relational.LogicalSchema.merge(LogicalSchema.java:760)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:114)
at org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.visitor.CastLineageSetter.(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I tried copying the file from /tmp/emails.json to /tmp/json_emails and loading it then - but that doesn't work. I tried calling PigStorage() explicitly, and that doesn't work either.
---------- Forwarded message ---------- From: Russell Jurney russell.jurney@gmail.com Date: Fri, Jun 22, 2012 at 4:05 PM Subject: Weird problem in Pig 0.10 with STOR'ing JSON and then LOADing it as PigStorage chararray To: user@pig.apache.org
The script that has worked in the past is thus:
/* Load Avro emails */ emails = load '/me/tmp/emails_big' using AvroStorage(); emails = filter emails by message_id IS NOT NULL;
/* JSONify the emails for ElasticSearch */ store emails into '/tmp/emails.json' using JsonStorage();
/* LOAD JSON as single field for storage in ElasticSearch with Wonderpig */ json_emails = load '/tmp/emails.json' using PigStorage() AS (json_record:chararray); store json_emails into 'es://email/email?id=message_id&json=true&size=1000' using ElasticSearch();
Now I get this error:
grunt> json_emails = load '/tmp/emails.json' AS (json_record:chararray);(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema: left is "json_record:chararray", right is "message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}" 2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "json_record:chararray", right is "message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}" at org.apache.pig.newplan.logical.relational.LogicalSchema.merge(LogicalSchema.java:760) at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:114) at org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100) at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.visitor.CastLineageSetter.
I tried copying the file from /tmp/emails.json to /tmp/json_emails and loading it then - but that doesn't work. I tried calling PigStorage() explicitly, and that doesn't work either.
How am I supposed to pull this off?
I figured it out:
grunt> rm /tmp/emails.json/.pig_header grunt> rm /tmp/emails.json/.pig_schema
Then I can load my JSON as chararray. Interesting problem.
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com