ssimms / pdfapi2

Create, modify, and examine PDF files in Perl
Other
15 stars 20 forks source link

PDF::API2->open() holds a filesystem lock until the returned object goes out of scope #34

Closed chrispitude closed 3 years ago

chrispitude commented 3 years ago

This is a weird bug that I see in Windows (Strawberry Perl) but not linux (native perl).

When I open a PDF file with PDF::API2->open(), somehow there is a lingering filesystem lock that prevents me from deleting the input file until the returned PDF object goes out of scope.

For example, this code:

my $pdf = PDF::API2->open('test.pdf');
unlink 'test.pdf' or warn "Could not unlink 'test.pdf': $!";

results in the following error:

Could not unlink 'test.pdf': Permission denied at bad.pl line 12.

But if I undefine $pdf (or force it to go out of scope):

my $pdf = PDF::API2->open('test.pdf');
$pdf = undef;
unlink 'test.pdf' or warn "Could not unlink 'test.pdf': $!";

then the unlink operation on the input file succeeds.

I tried the following versions, and the bug occurs with all of them:

Tiny testcase at:

testcase.tar.gz

ssimms commented 3 years ago

I think that might be working as intended. PDF::API2 doesn't read the entire file into memory since 2.039 in order to lower memory usage on large files, so it keeps the file open in order to be able to access parts of the file as needed.

If you need to work around this, you can stringify the PDF and open the string instead:

    sub memory_is_not_a_problem {
        my $file = shift();
        my $pdf = PDF::API2->open($file);
        return PDF::API2->open_scalar($pdf->stringify());
    }

As for why it's working on Linux but not Windows, I know that Linux filesystems tend to allow programs to continue to access deleted files as long as they previously had them open. Perhaps Windows filesystems prefer to ensure that a file is actually deleted as soon as the delete call happens, failing if that's not possible.

chrispitude commented 3 years ago

@ssimms - Thank you! I confirmed this behavior with the following test script:

#!/usr/bin/perl
use warnings;
use strict;

# write a line of text to 'testfile'
open(FHWRITE, '>', 'testfile') or die $!;
print FHWRITE "This is some text.\n";
close(FHWRITE);

# open 'testfile' for read
open(FH, '<', 'testfile') or die $!;
sleep(1);

# delete 'testfile'
unlink 'testfile' or warn "Could not unlink 'test.pdf': $!";
sleep(1);

# does file exist, can we read from it?
print "Does deleted file exist? ".(-f 'testfile' ? 'yes' : 'no')."\n";
print "Text read from deleted file: ".(<FH> =~ s!\n$!!r)."\n";

close(FH);

In linux, a simple read lock allows the file to be deleted (but still read from):

$ x.pl
Does deleted file exist? no
Text read from deleted file: This is some text.

In Windows, a simple read lock prevents the file from being deleted:

C:\pdf\testcase>perl x.pl
Could not unlink 'test.pdf': Permission denied at x.pl line 13.
Does deleted file exist? yes
Text read from deleted file: This is some text.

I also confirmed that a simple undef of the variable containing the PDF object is sufficient to destroy the object and remove the lock in Windows.

In the PDF::API2->open() description, it might be worth mentioning something like

A read-only filesystem lock is held on $pdf_file until the PDF object goes out of scope. This allows large PDF files to be accessed sparsely for efficiency.

which should be enough to raise awareness without having to write Windows-specific disclaimers.