mozilla / jydoop

Efficient Hadoop Map-Reduce in Python
Other
31 stars 19 forks source link

Despite this being called the "combiner" branch, this does everything except actually implement a combiner #3

Closed bsmedberg closed 11 years ago

bsmedberg commented 11 years ago

Present in this PR:

This change has been tested against telemetry data on mango-gw and works.

Next steps:

tarasglek commented 11 years ago

Can you explain 'Implement working reducers' part? I don't see much changes related to reducing.

bsmedberg commented 11 years ago

The point of "working reducers" is just that there were no examples of a working reducer, partly because the types you'd be getting were different in FileDriver and in HBaseDriver.

tarasglek commented 11 years ago

undid this merge due to weirdness: 13/04/01 17:10:23 INFO mapred.JobClient: Task Id : attempt_201303271811_0649_m_000608_0, Status : FAILED at java.io.DataOutputStream.writeUTF(DataOutputStream.java:347) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.mozilla.pydoop.TypeWritable.WriteType(TypeWritable.java:97) at org.mozilla.pydoop.TypeWritable.write(TypeWritable.java:114) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1057) at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.mozilla.pydoop.HBaseDriver$ContextWrapper.write(HBaseDriver.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.Nat attempt_201303271811_0649_m_000608_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient). attempt_201303271811_0649_m_000608_0: log4j:WARN Please initialize the log4j system properly. 1

Even though the code eats the exception, this looks like a null-pointer error if the source matches: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/DataOutputStream.java#347

However I don't see why asString would return a null, I added an extra null check after asString() and it didn't help

tarasglek commented 11 years ago

Note this was from running the anr.py script