twitter / hadoop-lzo

Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
GNU General Public License v3.0
546 stars 329 forks source link

pig fails to use lzo as compression for temp files #115

Open jefimm opened 8 years ago

jefimm commented 8 years ago

The following setup fails using hadoop 2.7.2 and pig 0.15.0 (Google cloud dataproc) The same job completes fine without lzo comression for temp files and fails with lzo compression for temp files (pig.tmpfilecompression=true pig.tmpfilecompression.codec=lzo) setup on all nodes during startup:

sudo apt-get install liblzo2-dev sudo ln -s /lib/x86_64-linux-gnu/liblzo2.so.2 /usr/lib/hadoop/lib/native/

copied hadoop-lzo-0.4.20-SNAPSHOT.jar to /usr/lib/hadoop-mapreduce/

edited core-site.xml and added

<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>

Error: java.lang.RuntimeException: java.io.IOException: Not a valid BCFile. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.init(WeightedRangePartitioner.java:155) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:75) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:58) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: Not a valid BCFile. at org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.pig.impl.io.TFileRecordReader.initialize(TFileRecordReader.java:64) at org.apache.pig.impl.io.ReadToEndLoader.initializeReader(ReadToEndLoader.java:212) at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:250) at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.init(WeightedRangePartitioner.java:129) ... 17 more

jrottinghuis commented 8 years ago

Is that a Pig issue, or a problem with hadoop-lzo?

Sent from my iPhone

On May 30, 2016, at 10:28 AM, Jefim Matskin notifications@github.com wrote:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

jefimm commented 8 years ago

I really don't know, the problem is that enabling temp file compression with pig to be lzo does not work