iliaal / php_excel

PHP Extension interface to the Excel writing/reading library
http://ilia.ws
534 stars 131 forks source link

xlsx to xls convertion #237

Closed velykanov closed 3 years ago

velykanov commented 6 years ago

Hello everyone!

I'm using this lib to parse xls/xlsx files into csv files. I've got xlsx file with 3 sheets and ~90k rows on each sheet. I'm reading this file using such a piece of code:

for ($row_num = $first_row; $row_num < $last_row; ++$row_num) {
    $row = $sheet->readRow($row_num, 0, -1, false);
}

Unfortunately, only the first page takes over 30 minutes of just reading (not a word about processing). Experiment shows such results:

0 rows - 0.0006718635559082
1000 rows - 0.06129789352417
2000 rows - 0.22869801521301
3000 rows - 0.51398682594299
4000 rows - 0.97719883918762
5000 rows - 1.5958619117737
6000 rows - 2.4356389045715
7000 rows - 3.3425178527832
8000 rows - 4.3792669773102
9000 rows - 5.5893278121948
10000 rows - 7.0040788650513
11000 rows - 8.5760028362274
12000 rows - 10.347718954086
13000 rows - 12.434111833572
14000 rows - 14.980521917343
15000 rows - 17.680223941803
16000 rows - 20.915424823761
17000 rows - 24.699790000916
18000 rows - 29.087457895279
19000 rows - 34.212054014206
20000 rows - 40.219815969467
...
92000 rows - 2069.2912549973
93000 rows - 2116.0395789146
94000 rows - 2163.0849909782

But I've tried to find a way to decrease execution time. After resaving this xlsx file into just xls format (using libreoffice calc) I've noticed that reading is much faster (execution takes only 3 seconds for whole file). But there was one sad thing: in xls files only ~65k rows can be stored. So I went further and realised, that I can convert xlsx files into xls files using this lib (even if I'll lose some rows) and then read all rows, that where cut, from original file (it would be much faster: 3 seconds of reading xls file + ~ 200 seconds for reading ~30k rows from tile of xlsx file). But if you'll try to use this code:

$workbook = new \ExcelBook($libxl_name, $libxl_code, 'xlsx' == $extension);
$res = $workbook->loadFile($from);
if ('xlsx' == $extension) {
    $workbook->save('some_file.xls');
    unset($workbook);
    $workbook = new \ExcelBook($libxl_name, $libxl_code, false);
    $res = $workbook->loadFile('some_file.xls);
}

then $res variable will be false. And if you'll try to open some_file.xls using program (e.g. libreoffice calc) it won't be opened 'cause it's broken for reading.

Probably, I'm using wrong way to manipulate files, but I was thinking it should work.