ogrisel / pignlproc

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
158 stars 64 forks source link

Invalid resource schema: bag schema must have tuple as its field #6

Open raymanrt opened 13 years ago

raymanrt commented 13 years ago

I have a problem with the execution of the following script

../../../pig-0.9.0/bin/pig -x local -p PIGNLPROC_JAR=target/pignlproc-0.1.0-SNAPSHOT.jar -p LANG=it -p INPUT=wikipedia-xml-chunks/chunk-0001.xml -p OUTPUT=workspace examples/ner-corpus/01_extract_sentences_with_links.pig

where the paths are right and the exeption generated is:

2011-08-30 17:47:38,867 [main] INFO org.apache.pig.Main - Logging error messages to: /home/rayman/workspace/pignlproc/pignlproc/pig_1314719258864.log 2011-08-30 17:47:39,006 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2011-08-30 17:47:39,376 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2011-08-30 17:47:39,391 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2218: Invalid resource schema: bag schema must have tuple as its field Details at logfile: /home/rayman/workspace/pignlproc/pignlproc/pig_1314719258864.log

The log file reports:

Pig Stack Trace

ERROR 2218: Invalid resource schema: bag schema must have tuple as its field

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Invalid resource schema: bag schema must have tuple as its field at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1652) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1597) at org.apache.pig.PigServer.registerQuery(PigServer.java:583) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:553) at org.apache.pig.Main.main(Main.java:108) Caused by: Failed to parse: Pig script failed to parse: <file examples/ner-corpus/01_extract_sentences_with_links.pig, line 20, column 30> Failed to generate logical plan. Nested exception: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245: <file examples/ner-corpus/01_extract_sentences_with_links.pig, line 15, column 9> Cannot get schema from loadFunc pignlproc.storage.ParsingWikipediaLoader at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1644) ... 9 more Caused by: <file examples/ner-corpus/01_extract_sentences_with_links.pig, line 20, column 30> Failed to generate logical plan. Nested exception: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245: <file examples/ner-corpus/01_extract_sentences_with_links.pig, line 15, column 9> Cannot get schema from loadFunc pignlproc.storage.ParsingWikipediaLoader at org.apache.pig.parser.LogicalPlanGenerator.alias_col_ref(LogicalPlanGenerator.java:12992) at org.apache.pig.parser.LogicalPlanGenerator.col_ref(LogicalPlanGenerator.java:12854) at org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:7789) at org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:7549) at org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:6959) at org.apache.pig.parser.LogicalPlanGenerator.cond(LogicalPlanGenerator.java:5894) at org.apache.pig.parser.LogicalPlanGenerator.filter_clause(LogicalPlanGenerator.java:5556) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1062) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171) ... 10 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245: <file examples/ner-corpus/01_extract_sentences_with_links.pig, line 15, column 9> Cannot get schema from loadFunc pignlproc.storage.ParsingWikipediaLoader at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:154) at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109) at org.apache.pig.parser.LogicalPlanGenerator.alias_col_ref(LogicalPlanGenerator.java:12990) ... 21 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2218: Invalid resource schema: bag schema must have tuple as its field at org.apache.pig.ResourceSchema$ResourceFieldSchema.throwInvalidSchemaException(ResourceSchema.java:213) at org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1887) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)

... 23 more

Unfortunately I don't know anything about pig scripting, and my question may appear a bit stupid. Any help would be appreciated.

Thanks, Riccardo

ogrisel commented 13 years ago

This is not stupid at all and probably a real bug. I am not had the time to run it on pig 0.9.x yet and it seems that the schema handling has changed a bit between 0.8 and 0.9, see for instance this message:

http://mail-archives.apache.org/mod_mbox/pig-user/201106.mbox/%3CBANLkTikzYS16vR=Zosb=OUpkOj5ebAeJhQ@mail.gmail.com%3E

I would suggest you to try on pig 0.8 in the mean time. Leave this issue open while I find the time to fix the schema to make it runnable on 0.9 as well.

renaud commented 12 years ago

subscribe (in the meantime, thanks for pig 0.8 workaround)

ogrisel commented 12 years ago

BTW, i would be pleased to merge a pull request if you can make it work on more recent versions of pig.

ogrisel commented 12 years ago

@renaud @raymanrt could try to see if @maxjakob fixes (now merged in master) your issues?