jmcnamara / excel-writer-xlsx

Perl module to create Excel XLSX files.
https://metacpan.org/pod/distribution/Excel-Writer-XLSX/lib/Excel/Writer/XLSX.pm
Other
100 stars 51 forks source link

Too many open files when using set_optimization() #175

Open sskopnik opened 7 years ago

sskopnik commented 7 years ago

I'm creating a lot of XLSX-Files, so to speed things up a bit I use set_optimization(). I'm doing a proper $workbook->close() after creating every file. In my TEMP-Dir (Windows 7 machine) there are created a lot of temp files. These files seem to be kept open during the hole process (can't access via editor) although I'm doing a $workbook->close() after each file creation. After some time (Hours...) my script reaches the magic limit of 2048 open temp files and crashes ;-( Any help? Maybe "Don't use set_optimization()" ?

vabbasi commented 6 years ago

Hi - any update on this issue/question? I believe I am running into the same problem or something similar, which occurs when trying to save a file with a large number of worksheets (101), each with about 8 image-inserts each and two charts each. See error message below, which occurs at the final close() of the Excel file. set_optimization() has been set and we are running Excel::Writer::XLSX 0.94.

(in cleanup) Couldn't import C:....jpg: Too many open files at .pl line 1826. 2017-12-09T09:45:48 - FATAL 1 - Error creating Excel: Error in tempfile() using C:\Users\crams\AppData\Local\Temp\XXXXXXXXXX: Could not create temp file C:\Users\crams\AppData\Local\Temp\Z591Q8OOkU: Too many open files at C:/Perl/site/lib/Excel/Writer/XLSX/Worksheet.pm line 220. 2017-12-09T09:45:48 - Error creating products: Error in tempfile() using C:\Users\crams\AppData\Local\Temp\XXXXXXXXXX: Could not create temp file C:\Users\crams\AppData\Local\Temp\Z591Q8OOkU: Too many open files at C:/Perl/site/lib/Excel/Writer/XLSX/Worksheet.pm line 220. ...propagated at .pl line 1833.

When searching for a solution to this, I found that the Python version seems to have a backdoor to open/close temp files, which might be useful to me. https://github.com/jmcnamara/XlsxWriter/issues/58

Is something similar be available in Perl? What would you recommend to avoid the problem?

Finally, are there any requirements/recommendations about when image and chart insertions should be performed with "set_optimization()", (e.g., relative to the cell writing)? Currently, I have these at the end of each sheet after other parts are written.

Many thanks and thanks for a great package overall.

vabbasi commented 6 years ago

Dug a bit deeper and seem to have solved the problem. Summarizing it here for others who may have a similar experience, (and maybe for John if he'd like to improve temp file management within the module, which I would probably recommend).

First, the problem wasn't the size of the workbook, but the number of workbooks I was creating within a single run. File::Temp, used by Excel::Writer::XLSX with "set_optimization()" enabled, doesn't perform cleanup until the program exits. With enough workbooks created within the run, eventually the temp directory hits 2048 files, after which File::Temp doesn't seem to work.

My solution involves taking the temp file management out of Excel::Writer::XLSX by...

(1) creating my own temporary directory

   my $ExcelTempDir = File::Temp::tempdir( CLEANUP => 1 );

 # (2) telling Excel Writer to use it
    my $workbook  = Excel::Writer::XLSX->new( $ExcelFileName );
    $workbook->set_tempdir($ExcelTempDir);
    $workbook->set_optimization();
      ...
 # (3) when I'm done, close the workbook and clean up the temporary directory
    $workbook->close() or die  "Error closing : $!";
    File::Temp::cleanup();

This solution could probably still hit a limit if someone's requirement/data set ends up needing to create 2048 temp files under the directory in question... but in my case, I should be okay to stop here.

Thanks and good luck!

vabbasi commented 6 years ago

It turns out that my problem is not fully solved... just pushed down a bit further. With the above code, all the worksheets were written properly, including the largest one, which ended up being 30MB and including 309 worksheets, each of which had about 6 images in them. Which was excellent. This failed consistently before I had refactored everything to use "set_optimization()".

However... when this large workbook goes out of scope, Perl crashes with an "Out of Memory" error. I moved the temp-file management outside the subroutine to ensure this wasn't a contributing factor. Now the last thing that happens in the routine is the "workbook->close()". As mentioned, the program creates the Workbook properly, but doesn't proceed outside the loop due to the "Out of Memory" crash. The "close" returns properly as I've also printed lines within the routine to confirm this. The "Out of Memory" problem occurs when we exit the subroutine and the workbook goes out of scope. Ironically, I moved to "set_optimization" because of "Out of Memory" errors... but they seem to have eventually returned. The problem is occurring with ActiveState Perl 5.16 on a Windows 7 box with 4 GB RAM. It doesn't occur on Windows 2008 Server machines with the same RAM and the same Perl versions. It's not really feasible to upgrade Perl versions.

Any suggestions would be greatly appreciated. Overall, the code generally looks like below.

main routine

print STDERR "Creating temp files directory\n"; my $ExcelTempDir = File::Temp::tempdir( CLEANUP => 1 ); &CreateExcel($ExcelFileName, $ExcelTempDir); print STDERR "Wrote Excel; cleaning up temp files\n"; File::Temp::cleanup();

subroutine

sub CreateExcel() { my ($ExcelFileName, $ExcelTempDir) = @_; my $workbook = Excel::Writer::XLSX->new( $ExcelFileName ); $workbook->set_tempdir($ExcelTempDir); $workbook->set_optimization(); ...

create data + add images for each worksheet, which in the "out of memory" case creates 309 sheets

...
return $workbook->close() or die  "Error closing : $!";

}