I have a sequence file with LzoCodec, that I am unable to read through the module .
from hadoop.io import SequenceFile
fh='/home/ekta/my_file'
reader = SequenceFile.Reader(fh)
first few lines in the file I am trying to read
SEQ org.apache.hadoop.io.Text com.bloomreach.proto.PwfPixelLog #com.hadoop.compression.lzo.LzoCodecF��7�u�v �W Y�d����F��7�u�v �W Y�du u
'
'd`
+8 � ` $
It seems to me that it is searching for a decompressor , but unable to find one. If this is supported, What am I doing wrong ?
Also, I installed hadoop-lzo from here, https://github.com/twitter/hadoop-lzo - though I see that the
com.hadoop.compression.lzo
Traceback (most recent call last):
File "/home/ekta/CUSTOM_WORK/protobuf.py", line 3, in
reader = SequenceFile.Reader(fh)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/SequenceFile.py", line 288, in init
self._initialize(path, start, length)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/SequenceFile.py", line 478, in _initialize
self._codec = CodecPool().getDecompressor(codec_class)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/compress/CodecPool.py", line 34, in getDecompressor
codec_class = ReflectionUtils.hadoopClassFromName(class_path)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/util/ReflectionUtils.py", line 24, in hadoopClassFromName
return classFromName(class_path)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/util/ReflectionUtils.py", line 44, in classFromName
module = import(module_name, globals(), locals(), [str(class_name)], -1)
ImportError: No module named com.hadoop.compression.lzo
in the hadoop-lzo package, I do see "com.hadoop.compression.lzo" - is it that the program is unable to find this class in hadoop-lzo . In the dist packages , I have Hadoop-0.1.4-py2.7.egg _lzo.so*, lzo.py, python_lzo-1.0.egg-info
I believe that com.hadoop.compression.lzo.LzoCodec.java might be needed to read my file as above ?
I have a sequence file with LzoCodec, that I am unable to read through the module .
from hadoop.io import SequenceFile fh='/home/ekta/my_file' reader = SequenceFile.Reader(fh)
first few lines in the file I am trying to read
SEQ org.apache.hadoop.io.Text com.bloomreach.proto.PwfPixelLog #com.hadoop.compression.lzo.LzoCodecF��7�u�v �W Y�d����F��7�u�v �W Y�du u
' 'd`
+8 � ` $
It seems to me that it is searching for a decompressor , but unable to find one. If this is supported, What am I doing wrong ? Also, I installed hadoop-lzo from here, https://github.com/twitter/hadoop-lzo - though I see that the com.hadoop.compression.lzo
Traceback (most recent call last): File "/home/ekta/CUSTOM_WORK/protobuf.py", line 3, in
reader = SequenceFile.Reader(fh)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/SequenceFile.py", line 288, in init
self._initialize(path, start, length)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/SequenceFile.py", line 478, in _initialize
self._codec = CodecPool().getDecompressor(codec_class)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/compress/CodecPool.py", line 34, in getDecompressor
codec_class = ReflectionUtils.hadoopClassFromName(class_path)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/util/ReflectionUtils.py", line 24, in hadoopClassFromName
return classFromName(class_path)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/util/ReflectionUtils.py", line 44, in classFromName
module = import(module_name, globals(), locals(), [str(class_name)], -1)
ImportError: No module named com.hadoop.compression.lzo
in the hadoop-lzo package, I do see "com.hadoop.compression.lzo" - is it that the program is unable to find this class in hadoop-lzo . In the dist packages , I have Hadoop-0.1.4-py2.7.egg _lzo.so*, lzo.py, python_lzo-1.0.egg-info
I believe that com.hadoop.compression.lzo.LzoCodec.java might be needed to read my file as above ?
:~/Downloads/hadoop-lzo$ tree [..more ]
| | |-- com | | | | |-- hadoop | | | | | |-- compression | | | | | | `-- lzo | | | | | | |-- CChecksum.java | | | | | | |-- DChecksum.java | | | | | | |-- DistributedLzoIndexer.java | | | | | | |-- GPLNativeCodeLoader.java | | | | | | |-- LzoCodec.java | | | | | | |-- LzoCompressor.java | | | | | | |-- LzoDecompressor.java | | | | | | |-- LzoIndex.java | | | | | | |-- LzoIndexer.java | | | | | | |-- LzoInputFormatCommon.java | | | | | | |-- LzopCodec.java | | | | | | |-- LzopDecompressor.java | | | | | | |-- LzopInputStream.java | | | | | | |-- LzopOutputStream.java