matteobertozzi / Hadoop

Hadoop (Utilities, Patches and Examples)
http://th30z.blogspot.com
242 stars 148 forks source link

Does the SequenceFile.Reader support LzoCodec ? #12

Open ekta1007 opened 9 years ago

ekta1007 commented 9 years ago

I have a sequence file with LzoCodec, that I am unable to read through the module .

from hadoop.io import SequenceFile fh='/home/ekta/my_file' reader = SequenceFile.Reader(fh)

first few lines in the file I am trying to read

SEQ org.apache.hadoop.io.Text com.bloomreach.proto.PwfPixelLog #com.hadoop.compression.lzo.LzoCodecF��7�u�v �W Y�d����F��7�u�v �W Y�du u

' 'd`

+8 � ` $

It seems to me that it is searching for a decompressor , but unable to find one. If this is supported, What am I doing wrong ? Also, I installed hadoop-lzo from here, https://github.com/twitter/hadoop-lzo - though I see that the com.hadoop.compression.lzo

Traceback (most recent call last): File "/home/ekta/CUSTOM_WORK/protobuf.py", line 3, in reader = SequenceFile.Reader(fh) File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/SequenceFile.py", line 288, in init self._initialize(path, start, length) File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/SequenceFile.py", line 478, in _initialize self._codec = CodecPool().getDecompressor(codec_class) File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/compress/CodecPool.py", line 34, in getDecompressor codec_class = ReflectionUtils.hadoopClassFromName(class_path) File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/util/ReflectionUtils.py", line 24, in hadoopClassFromName return classFromName(class_path) File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/util/ReflectionUtils.py", line 44, in classFromName module = import(module_name, globals(), locals(), [str(class_name)], -1) ImportError: No module named com.hadoop.compression.lzo

in the hadoop-lzo package, I do see "com.hadoop.compression.lzo" - is it that the program is unable to find this class in hadoop-lzo . In the dist packages , I have Hadoop-0.1.4-py2.7.egg _lzo.so*, lzo.py, python_lzo-1.0.egg-info

I believe that com.hadoop.compression.lzo.LzoCodec.java might be needed to read my file as above ?

:~/Downloads/hadoop-lzo$ tree [..more ]

| | |-- com | | | | |-- hadoop | | | | | |-- compression | | | | | | `-- lzo | | | | | | |-- CChecksum.java | | | | | | |-- DChecksum.java | | | | | | |-- DistributedLzoIndexer.java | | | | | | |-- GPLNativeCodeLoader.java | | | | | | |-- LzoCodec.java | | | | | | |-- LzoCompressor.java | | | | | | |-- LzoDecompressor.java | | | | | | |-- LzoIndex.java | | | | | | |-- LzoIndexer.java | | | | | | |-- LzoInputFormatCommon.java | | | | | | |-- LzopCodec.java | | | | | | |-- LzopDecompressor.java | | | | | | |-- LzopInputStream.java | | | | | | |-- LzopOutputStream.java

talglobus commented 5 years ago

Running into this same issue