nuovo / spreadsheet-reader

A PHP spreadsheet reader (Excel XLS and XLSX, OpenOffice ODS, and variously separated text files) with a singular goal of getting the data out, efficiently
http://www.nuovo.lv/
Other
674 stars 498 forks source link

Fail to find sheets in XLSX made using LibreOffice + fix #59

Open cmanley opened 10 years ago

cmanley commented 10 years ago

This is my test script:

<?php
require_once('spreadsheet-reader/SpreadsheetReader.php');
$reader = new SpreadsheetReader('test2.xlsx');
foreach ($reader as $row) {
    print_r($row);
    break;
}

These are the warnings I get, and no row is printed:

php t.php 
PHP Warning:  XMLReader::open(): Empty string supplied as input in spreadsheet-reader/SpreadsheetReader_XLSX.php on line 928
PHP Warning:  XMLReader::read(): Load Data before trying to read in spreadsheet-reader/SpreadsheetReader_XLSX.php on line 965

My XLSX file contains 1 sheet with only the first 2 fields filled in. Perhaps it's related to LibreOffice, since I created the XLSX file with LibreOffice.

I debugged the problem using a lot of error_logging. It would make life easier if the code was written in a more paranoid style using exceptions. The problem was that SpreadsheetReader_XLSX was trying to open the non-existant sheet2.xml instead of sheet1.xml in the temporary directory.

I don't know much about the XLSX structure and why r:id was being parsed to get the sheet number, but this is my quick fix to SpreadsheetReader_XLSX.php that works for me (the 'fix' block is my fix, the next block is the original):

    public function Sheets()
    {
        if ($this -> Sheets === false)
        {
            $this -> Sheets = array();
            foreach ($this -> WorkbookXML -> sheets -> sheet as $Index => $Sheet)
            {
                if ('fix') {
                    //Parse this: <sheet name="Sheet1" sheetId="1" state="visible" r:id="rId2"/>
                    $Attributes = $Sheet -> attributes();
                    foreach ($Attributes as $Name => $Value)
                    {
                        if ($Name == 'sheetId') {
                            $SheetID = (int)$Value;
                            break;
                        }
                    }
                    $this -> Sheets[$SheetID] = (string)$Sheet['name'];
                }
                else {
                    $Attributes = $Sheet -> attributes('r', true);
                    foreach ($Attributes as $Name => $Value)
                    {
                        if ($Name == 'id')
                        {
                            $SheetID = (int)str_replace('rId', '', (string)$Value);
                            break;
                        }
                    }
                    $this -> Sheets[$SheetID] = (string)$Sheet['name'];
                }
            }
            ksort($this -> Sheets);
        }
        return array_values($this -> Sheets);
jonascarlbaum commented 10 years ago

Thumbs up for this "fix"!!!

This fix makes it work on my Mac too. I have the same issue with the original (21000+ rows) XSLX — saved from MS Office on a Windows machine — as I have with my 300 rows copy, saved as XSLX with LibreOffice on my Mac.

XMLReader::open(): Empty string supplied as input

So, on my Mac this fix solves XSLX-parsing...

I would like this fix included in this repo so that I can use this repo in a Laravel project I'm working on. Cloning this repo and inserting this fix isn't an option that I would prefer, I would like to receive future fixes etc. without having to maintain it synced.

Is this repo still active? (last commit seven months ago if I recall correctly?) Is there plans to fix existing bugs?

Update

I see this seems to be fixed in 0.6.0-pre-alpha — I'm I correct? Is there any hopes for this fix coming into the master branch in any near future?

jonascarlbaum commented 10 years ago

I was using cmanley:s "fix" as a ground for making both standard XLSX and LibreOffice saved ones both work with this library.

I actually did my own interpretation of when to use sheetId and when to use the rId-varaiable (attributes with prefix boolean set).

The logic that I presumed working when I elaborated with it was that LibreOffice saved XLSX:s has rId higher than sheetId, so in that case we select the lowest one (sheetId), if not we used the standard rId, so min($rId, $sheetId) did the trick...

I found it working with both LibreOffice saved XSLX:s and original Windows/MS Office/Excel saved ones when I did like this:

public function Sheets()
{
  if ($this -> Sheets === false)
  {
    $this -> Sheets = array();
    foreach ($this -> WorkbookXML -> sheets -> sheet as $Index => $Sheet)
    {
      $AttributesWithPrefix = $Sheet -> attributes('r', true);
      $Attributes = $Sheet -> attributes();

      $rId = 0;
      $sheetId = 0;

      foreach ($AttributesWithPrefix as $Name => $Value)
      {
        if ($Name == 'id')
        {
          $rId = (int)str_replace('rId', '', (string)$Value);
          break;
        }
      }
      foreach ($Attributes as $Name => $Value)
      {
        if ($Name == 'sheetId') {
          $sheetId = (int)$Value;
          break;
        }
      }

      $this -> Sheets[min($rId, $sheetId)] = (string)$Sheet['name'];
    }
    ksort($this -> Sheets);
  }
  return array_values($this -> Sheets);
}

This is a working symbios for me — works on my machine... ;)

sangar82 commented 9 years ago

@jonascarlbaum you saved my day!! Why not send a PR with this change?

jonascarlbaum commented 9 years ago

Great @sangar82

I see you created this pull request. Good. Hope they'll merge it in, if it proves solid in all cases. It works in production in the system I was working on at the time.

However, in all the Excel-files we parsed we were only interested in the first sheet, so I can't promise it is a success on a many sheets Excel-file. Guess it has to be tested...

cmanley commented 8 years ago

Please check my fork out and let me know if all issues are solved: https://github.com/cmanley/spreadsheet-reader If they are solved, then I'll create a pull request. I've added test files and a phpunit test to my commit that you can also execute standalone.

This is the diff (whitespace turned off): https://github.com/cmanley/spreadsheet-reader/commit/c6696cffb9f7eecd19cf39063a2e25014294a7c4?w=1

I completely rewrote the sheets extraction part because the original code was doing it wrong, making assumptions about file name case and ids, and it wasn't using the xl/_rels/workbook.xml.rels file to locate the sheet file entry.

This is the commit message / list of changes:

cesarmiquel commented 8 years ago

Just wanted to comment that I was having issues with an XLSX file created with Libreoffice / Openoffice and the fork that @cmanley provided worked perfectly so if people are coming across issues with XLSX files I recommend trying this out and, ideally I'd love to see this code merged into the repo. Kudos to @cmanley ! 👍

cmanley commented 8 years ago

There are still problems with XLSX files generated by Crystal Reports though that my fork hasn't solved. I haven't done more work on it because I tried the latest version of https://github.com/box/spout and that seems to work perfectly in all cases, and it's also fast.

cesarmiquel commented 8 years ago

Good to know! Unfortunately our app is a legacy web application that needs PHP 5.3 and Spout required 5.4+ :(. We are in the process of upgrading it to PHP 5.6 but in the meantime this library + your fix seems to work fine.

sangar82 commented 8 years ago

@cmanley is more fast, a lot more. I switched to spout because this project is dead

timotheemoulin commented 8 years ago

Any chance that the project will be updated now? It all seems dead to me.

woigl commented 6 years ago

Can someone approve the requested changes?

darshanBabu commented 6 years ago

@cmanley thanks for the fix .

YashSheth1 commented 5 years ago

@cmanley Thank you for your "FIX"

eldinphp commented 5 years ago

It works here too on 7.2, Linux Mint.. so thanks.

raycaballero commented 3 years ago

Thank you for this! I was pulling my hair because of this bug.