pmqs / IO-Compress

IO-Compress - Perl5 module to read/write compressed data in multiple formats
14 stars 16 forks source link

silently stops reading into buffer after exactly 64MB #54

Closed XSven closed 9 months ago

XSven commented 9 months ago

I do have a big gz file

gzip -cd ~/tmp/file.gz  | wc -c
 452058742

I am processing the file with the t/bigfile.t test script

#<<<
use strict; use warnings;
#>>>

use Test::More import => [ qw( BAIL_OUT is note use_ok ) ], tests => 4;

use Config     qw( %Config );
use File::Temp qw( tempdir tempfile );

my $module;

BEGIN {
  my $module = 'IO::Uncompress::Gunzip';
  use_ok $module, 2.206, '$GunzipError' or BAIL_OUT "Cannot load module '$module'!";
}

note "Perl $] at $^X";
note "OS $Config{ osname } $Config{ osvers }";

my $input = shift;
note 'Input filename:      ', $input;
note 'Input size in bytes: ', ( stat( $input ) )[ 7 ];

my $z = IO::Uncompress::Gunzip->new( $input, { Append => 1 } )
  or BAIL_OUT "Cannot create $module object: $GunzipError!";

my $buffer;
my $status;
1 while $status = $z->read( $buffer ) > 0;

is $status,      '', '$status is empty should be 0 (eof expected)';
is $GunzipError, '', '$GunzipError is empty';

my ( $fh, $output ) = tempfile( DIR => tempdir( CLEANUP => 1 ) );
note 'Output filename: ', $output;

print $fh $buffer;

my $size = ( stat( $fh ) )[ 7 ];

is $size / 1024 / 1024, 64, 'Output size is 64MB';

The output is this

prove  --verbose t/bigfile.t :: ~/tmp/file.gz
t/bigfile.t ..
1..4
ok 1 - use IO::Uncompress::Gunzip;
# Perl 5.014004 at /opt/perlbrew/perls/perl-5.14.4/bin/perl
# OS aix 7.2.0.0
# Input filename:      /home/micsw/tmp/file.gz
# Input size in bytes: 7324449
ok 2 - $status is empty should be 0 (eof expected)
ok 3 - $GunzipError is empty
# Output filename: /tmp/DMXW8Uh9rG/p4EPsfa2vT
ok 4 - Output size is 64MB
ok
All tests successful.
Files=1, Tests=4,  1 wallclock secs ( 0.02 usr  0.01 sys +  0.13 cusr  0.07 csys =  0.23 CPU)
Result: PASS

My expectation is that at least one of the two assertions

ok 2 - $status is empty should be 0 (eof expected)
ok 3 - $GunzipError is empty

should fail because the unzip action stops prematurely after 64MB without raising a warning or an exception.

I have used PerlIO::gzip as an alternative method to uncompress. The phenomenon is the same. Reading and uncompressing the data stops after 64MB. I do not understand why!

pmqs commented 9 months ago

Hey @XSven

thanks for the report. It sounds like there is something unusual with the file you are uncompressing. Is it available for me to test?

Can you see what gunzip thinks about the file by running this

gunzip -t  ~/tmp/file.gz

One other thing to try is to add the option MultiStream => 1 when creating the gunzip object

my $z = IO::Uncompress::Gunzip->new( $input,  Append => 1, MultiStream => 1 )
XSven commented 9 months ago

I have checked the integrity first and the file is proper. I was also able to uncompress it manually using gzip. The MultiStream => 1 option has solved my problem (Thx)! I have found more information here: Dealing with concatenated gzip files

From my perspective a warning should be raised if IO::Uncompress::Gunzip detects multiple data streams and the MultiStream option is off. Could this be implemented?

pmqs commented 9 months ago

I have checked the integrity first and the file is proper. I was also able to uncompress it manually using gzip. The MultiStream => 1 option has solved my problem (Thx)! I have found more information here: Dealing with concatenated gzip files

Excellent!

From my perspective a warning should be raised if IO::Uncompress::Gunzip detects multiple data streams and the MultiStream option is off. Could this be implemented?

Need to research that a bit more to understand the implications. Added to my TODO list.

pmqs commented 9 months ago

Closing - issue added to TODO list.