Closed naoko closed 7 years ago
IMO, contents of the snappy file will not be expected format.
OutOfMemoryError was raised at this line. Java tried to allocate memory whose size was len
, which was compressed length of a chunk of raw compressed data. The length is usually 256k at most. However, the length was too large to fail to allocate memory.
I guess two probabilities. (1) The snappy format used in Hadoop is different with what I expected. (2) The file in the hadoop is not a snappy file but has .snappy suffix.
Thank you for your response, Kubo,
I double-checked by downloading the .snappy file and was able to uncompress with snzip
command so the remaining possibility is the length.
So that means I should find the value of io.compression.codec.snappy.buffersize
then use -b
flag to run the command? Did I understand correctly?
-b
won't fix this case. If -b
is too large, hadoop prints Could not decompress data. Buffer length is too small
.
I had not used hadoop. So I set up hadoop hdfs today and checked whether a files compressed by snzip could be retrieved via hadoop fs -text
. As far as I checked it worked.
Could you run the following commands?
$ echo Hello World > hello.txt
$ snzip -t hadoop-snappy hello.txt
$ hadoop fs -put hello.txt.snappy
$ hadoop fs -text hello.txt.snappy
17/01/29 17:58:31 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
Hello World
What OS do you use?
What version of hadoop do you use?
$ hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /home/kubo/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
What version of snappy does the hadoop use? If you use linux,
$ strace -f -o strace.log -e trace=open hadoop fs -text <snappy_file>
$ grep libsnappy strace.log | grep -v '= -1'
If the output is 29484 open("/usr/lib/x86_64-linux-gnu/libsnappy.so.1", O_RDONLY|O_CLOEXEC) = 199
, the snappy library used by the hadopp is /usr/lib/x86_64-linux-gnu/libsnappy.so.1
.
$ ls -l /usr/lib/x86_64-linux-gnu/libsnappy.so.1
lrwxrwxrwx 1 root root 18 Oct 6 2015 /usr/lib/x86_64-linux-gnu/libsnappy.so.1 -> libsnappy.so.1.3.0
The snappy version is 1.3.0 because the real file name is libsnappy.so.1.3.0
.
What version of snzip do you use?
$ snzip -h
snzip 1.0.4
Usage: snzip [option ...] [file ...]
...
What version of snappy does the snzip use? If you use linux,
$ env LD_TRACE_LOADED_OBJECTS=1 snzip
linux-vdso.so.1 => (0x00007fff1f95d000)
libsnappy.so.1 => /usr/lib/x86_64-linux-gnu/libsnappy.so.1 (0x00007fdf4d170000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fdf4cdc9000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fdf4cbb3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdf4c7ea000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fdf4c4e1000)
/lib64/ld-linux-x86-64.so.2 (0x000055fa78904000)
The snzip uses /usr/lib/x86_64-linux-gnu/libsnappy.so.1
.
$ ls -l /usr/lib/x86_64-linux-gnu/libsnappy.so.1
lrwxrwxrwx 1 root root 18 Oct 6 2015 /usr/lib/x86_64-linux-gnu/libsnappy.so.1 -> libsnappy.so.1.3.0
The snappy version is 1.3.0 because the real file name is libsnappy.so.1.3.0
.
@kubo , thank you very much for taking your time. I followed your instruction and was able to uncompress with no issue. So I scratched my head and went back to my original file and this time there is no error... it uncompressed just fine. I am utterly confused and feel so bad and ashamed :( I'm terribly sorry for this ticket and thank you again for taking your time. If I ever find the reason why this works now I will report back. I am closing this ticket now. Thank you very much for providing great library.
No problem. It was goo chance for me to try installing hadoop.
snzip -t hadoop-snappy hello.txt
Nice !!!
Hello! So.. with
snzip -t hadoop-snappy <file_to_compress>
I can compress and decopress withsnzip -d <snappy_file>
just fine. I moved the file to hadoop cluster and ran:hadoop fs -text <snappy_file>
and got the following error and not sure where to go from here and would like to have your advice please.I was able to run
hadoop fs -text <much-bigger-snappy>
for much bigger file and no problem. soMemory Error
is misleading... please let me know if there is anything I can provide.