Open raymanrt opened 13 years ago
Hum there does not seem to be pignlproc related packages in the stacktrace... Is this error random or systematically reproduced?
Executing the same script on a different machine gives me the following excepiton:
2011-08-31 14:07:11,305 [Thread-622] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100 2011-08-31 14:07:11,325 [Thread-622] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720 2011-08-31 14:07:11,325 [Thread-622] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680 2011-08-31 14:07:11,326 [Thread-622] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-08-31 14:07:11,326 [Thread-622] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2011-08-31 14:07:11,327 [Thread-622] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-08-31 14:07:11,736 [Thread-622] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003 java.io.IOException: Illegal partition for Null: false index: 0 (http://it.wikipedia.org/wiki/Eccitone,Scintillatore,15) (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 2011-08-31 14:07:14,917 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0003 has failed! Stop running all dependent jobs 2011-08-31 14:07:14,919 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2011-08-31 14:07:14,919 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2011-08-31 14:07:14,919 [main] INFO org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats reported below may be incomplete 2011-08-31 14:07:14,922 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2 0.8.1 brainaetic 2011-08-31 13:41:30 2011-08-31 14:07:14 ORDER_BY,FILTER
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds): JobId Alias Feature Outputs job_local_0001 noredirect,parsed,sentences,stored MAP_ONLY job_local_0002 ordered SAMPLER
Failed Jobs: JobId Alias Feature Message Outputs job_local_0003 ordered ORDER_BY Message: Job failed! file:///home/brainaetic/rayman/ner-training-itwiki/workspace/it/sentences_with_links,
Input(s): Successfully read records from: "file:///home/brainaetic/rayman/itwiki-latest-pages-articles.xml"
Output(s): Failed to produce result in "file:///home/brainaetic/rayman/ner-training-itwiki/workspace/it/sentences_with_links"
Job DAG: job_local_0001 -> job_local_0002, job_local_0002 -> job_local_0003, job_local_0003
2011-08-31 14:07:14,922 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-08-31 14:07:14,924 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-08-31 14:07:14,924 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs 2011-08-31 14:07:14,927 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-08-31 14:07:14,932 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message Details at logfile: /home/brainaetic/rayman/ner-training-itwiki/pig_1314790889277.log
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:500) at org.apache.pig.Main.main(Main.java:107)
Unfortunately I have no idea what's happening. The best way to proceed would be to isolate the few Wikipedia articles that trigger the failure (assuming they are always the same) in a unit tests to be able to use the debugger and trace the origin of the issue.
Third execution went wrong with:
java.io.IOException: Illegal partition for Null: false index: 0 (http://it.wikipedia.org/wiki/Regno_di_Sardegna,Santa Margherita di Staffora,13) (3)
Let's try another one run, but they are all different pages by now...
And again:
java.io.IOException: Illegal partition for Null: false index: 0 (http://it.wikipedia.org/wiki/Repubblica_Socialista_Federale_di_Jugoslavia,Luciano Sušanj,2) (3)
same here:
2012-02-28 19:31:51,008 [Thread-1469] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
java.io.IOException: Illegal partition for Null: false index: 0 (http://fr.wikipedia.org/wiki/Casimiro_Nay,Projet:Football/Index/C,1) (1)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2 0.8.1 richarde 2012-02-28 18:24:06 2012-02-28 19:31:54 ORDER_BY,FILTER
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0003 ordered ORDER_BY Message: Job failed! file:///pignlproc/output/wiki_dump_parsed/fr/sentences_with_links,
turning off sorting for now
diff --git a/examples/ner-corpus/01_extract_sentences_with_links.pig b/examples/ner-corpus/01_extract_sentences_with_links.pig
index ead569e..2767b39 100644
--- a/examples/ner-corpus/01_extract_sentences_with_links.pig
+++ b/examples/ner-corpus/01_extract_sentences_with_links.pig
@@ -28,6 +28,8 @@ sentences = FOREACH projected
stored = FOREACH sentences
GENERATE title, sentenceOrder, linkTarget, linkBegin, linkEnd, sentence;
+STORE stored INTO '$OUTPUT/$LANG/sentences_with_links_unordered';
+
-- Ensure ordering for fast merge with type info later
-ordered = ORDER stored BY linkTarget ASC, title ASC, sentenceOrder ASC;
-STORE ordered INTO '$OUTPUT/$LANG/sentences_with_links';
+-- ordered = ORDER stored BY linkTarget ASC, title ASC, sentenceOrder ASC;
+-- STORE ordered INTO '$OUTPUT/$LANG/sentences_with_links';
for the record, changing to hadoop-0.20.2 (I tried before hadoop-0.20.205.0 and hadoop-0.23.1) and switching to single node setup (instead of local) worked for me.
Hum, so this might be a pig / hadoop versioning bug?
I would assume...
In that pig file set default_parallel to 2 would fix the bug for the local test.
Bests, Mohammed Qwaider
Hi, the command given is: pig-0.8.1/bin/pig -x local -p PIGNLPROC_JAR=pignlproc/target/pignlproc-0.1.0-SNAPSHOT.jar -p LANG=it -p INPUT=/home/rayman/Scrivania/wiki_dump/itwiki-latest-pages-articles.xml -p OUTPUT=workspace pignlproc/examples/ner-corpus/01_extract_sentences_with_links.pig
With pig-0.8.1 seems to work well also with only one chunk of the dump, so I decided to process the whole dump (I have only one machine but there's no hurry. After a couple of hour of processing, the error is the following:
2011-08-31 11:45:25,856 [Thread-624] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003 java.io.IOException: Illegal partition for Null: false index: 0 (http://it.wikipedia.org/wiki/Regione_di_Worodougou,Diocesi di Odienné,4) (3) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 2011-08-31 11:45:26,970 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0003 has failed! Stop running all dependent jobs 2011-08-31 11:45:26,972 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2011-08-31 11:45:26,973 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2011-08-31 11:45:26,973 [main] INFO org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats reported below may be incomplete 2011-08-31 11:45:26,975 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2 0.8.1 rayman 2011-08-31 11:09:15 2011-08-31 11:45:26 ORDER_BY,FILTER
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds): JobId Alias Feature Outputs job_local_0001 noredirect,parsed,sentences,stored MAP_ONLY
job_local_0002 ordered SAMPLER
Failed Jobs: JobId Alias Feature Message Outputs job_local_0003 ordered ORDER_BY Message: Job failed! file:///home/rayman/ner-training-itwiki/workspace/it/sentences_with_links,
Input(s): Successfully read records from: "/home/rayman/Scrivania/wiki_dump/itwiki-latest-pages-articles.xml"
Output(s): Failed to produce result in "file:///home/rayman/ner-training-itwiki/workspace/it/sentences_with_links"
Job DAG: job_local_0001 -> job_local_0002, job_local_0002 -> job_local_0003, job_local_0003
2011-08-31 11:45:26,975 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-08-31 11:45:26,977 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-08-31 11:45:26,978 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs 2011-08-31 11:45:26,980 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-08-31 11:45:26,984 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message Details at logfile: /home/rayman/ner-training-itwiki/pig_1314781753331.log
And the log file says:
Pig Stack Trace
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:500)
at org.apache.pig.Main.main(Main.java:107)
pig_1314781753331.log (END)
What do you think about it?
Riccardo