mozilla / jydoop

Efficient Hadoop Map-Reduce in Python
Other
31 stars 19 forks source link

Java object in output file #17

Closed darchons closed 11 years ago

darchons commented 11 years ago

I ran some ANR report jobs on Mango. For example,

make hadoop ARGS="scripts/anr.py anr-20130403-20130403 20130403 20130403"

In the hdfs output directory, I see mostly 90-byte long files,

-rw-r--r--   3 nchen users         90 2013-04-09 07:19 /user/nchen/anr-20130403-20130403/part-m-00900
-rw-r--r--   3 nchen users         90 2013-04-09 07:19 /user/nchen/anr-20130403-20130403/part-m-00901
-rw-r--r--   3 nchen users      39244 2013-04-09 07:19 /user/nchen/anr-20130403-20130403/part-m-00902
-rw-r--r--   3 nchen users         90 2013-04-09 07:17 /user/nchen/anr-20130403-20130403/part-m-00903
-rw-r--r--   3 nchen users         90 2013-04-09 07:19 /user/nchen/anr-20130403-20130403/part-m-00904
-rw-r--r--   3 nchen users         90 2013-04-09 07:19 /user/nchen/anr-20130403-20130403/part-m-00905

Examining the content of a 90-byte file, it appears to be a serialized Java object,

0000000: 53 45 51 06 1f 6f 72 67 2e 6d 6f 7a 69 6c 6c 61  SEQ..org.mozilla
0000010: 2e 70 79 64 6f 6f 70 2e 54 79 70 65 57 72 69 74  .pydoop.TypeWrit
0000020: 61 62 6c 65 1f 6f 72 67 2e 6d 6f 7a 69 6c 6c 61  able.org.mozilla
0000030: 2e 70 79 64 6f 6f 70 2e 54 79 70 65 57 72 69 74  .pydoop.TypeWrit
0000040: 61 62 6c 65 00 00 00 00 00 00 9c 60 4e fb 04 2f  able.......`N../
0000050: c2 53 91 26 f8 a4 a9 92 7e 7f                    .S.&....~.

Other files longer than 90 bytes also have this header in them.

I also ran into exceptions when running some jobs. This only happened intermittently so I'm not sure it's related. For example,

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '', but failed after 10 attempts.
Exceptions:
java.lang.NullPointerException
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1290)
    at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1142)
    at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1065)
    at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:559)
    at com.mozilla.hadoop.hbase.mapreduce.MultiScanTableInputFormat$TableRecordReader.restart(MultiScanTableInputFormat.java:232)
    at com.mozilla.hadoop.hbase.mapreduce.MultiScanTa
bsmedberg commented 11 years ago

The part- files are not a bug. jydoop combines these files at the end of the job into a file on your local filesystem.

The regionServer exception is not, as far as I know, a jydoop bug, it's just a temporary problem with the cluster.