nuovo / spreadsheet-reader

A PHP spreadsheet reader (Excel XLS and XLSX, OpenOffice ODS, and variously separated text files) with a singular goal of getting the data out, efficiently
http://www.nuovo.lv/
Other
674 stars 497 forks source link

All cells are empy #149

Open woigl opened 6 years ago

woigl commented 6 years ago

I am loading an XLSX file and I get the correct worksheet name and also the correct number of rows and columns(cells).

Worksheets: 1 Columns: 59 Rows: 29075

The issue is that all the cells printed as empty by using the following code:

$Reader = new SpreadsheetReader('/var/www/tmp/ArtikeldatenStandard.xlsx');
$Sheets = $Reader -> Sheets();

foreach ($Sheets as $Index => $Name)
{
    echo 'Sheet #'.$Index.': '.$Name;

    $Reader -> ChangeSheet($Index);

    foreach ($Reader as $Row)
    {
        print_r($Row);
    }
}

I don't want to post here the XLSX file. In case someone needs it for debugging purpose, then please let me know how I should provide the file (email, FTP, ...)
woigl commented 6 years ago

I've resolved the issue by myself.

After opening the .xlsx file with Excel and saving it again, it seemed to work. Therefore I've analyzed the issue further.

I figured out that the .xlsx files have a slight difference in the sst element, which messes up the function PrepareSharedStringCache(). The function is looking for the count attribute, which was in my case not there and therefore it exited at the if condition.

The working sst element:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1640" uniqueCount="809">

The not working sst element:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="809">

To resolve the matter, I've added the check for uniqueCount in case if count is not available. The corrected functions looks like this:

private function PrepareSharedStringCache()
{
    while ($this -> SharedStrings -> read())
    {
        if ($this -> SharedStrings -> name == 'sst')
        {
            $this -> SharedStringCount = $this -> SharedStrings -> getAttribute('count');
            if (!$this -> SharedStringCount) $this -> SharedStringCount = $this -> SharedStrings -> getAttribute('uniqueCount');
            break;
        }
    }

    if (!$this -> SharedStringCount || (self::SHARED_STRING_CACHE_LIMIT < $this -> SharedStringCount && self::SHARED_STRING_CACHE_LIMIT !== null))
    {
        return false;
    }

    $CacheIndex = 0;
    $CacheValue = '';
    while ($this -> SharedStrings -> read())
    {
        switch ($this -> SharedStrings -> name)
        {
            case 'si':
                if ($this -> SharedStrings -> nodeType == XMLReader::END_ELEMENT)
                {
                    $this -> SharedStringCache[$CacheIndex] = $CacheValue;
                    $CacheIndex++;
                    $CacheValue = '';
                }
                break;
            case 't':
                if ($this -> SharedStrings -> nodeType == XMLReader::END_ELEMENT)
                {
                    continue;
                }
                $CacheValue .= $this -> SharedStrings -> readString();
                break;
        }
    }

    $this -> SharedStrings -> close();
    return true;
}

If someone has a better solution, please drop me a message.

woigl commented 6 years ago

Does this project still have active contributors?

If not, can someone add me as a contributor to add my changes, or do I need to clone the whole project?

kAlvaro commented 6 years ago

In other words:

Index: SpreadsheetReader_XLSX.php
===================================================================
--- SpreadsheetReader_XLSX.php  (revision 645)
+++ SpreadsheetReader_XLSX.php  (working copy)
@@ -435,6 +435,7 @@
                if ($this -> SharedStrings -> name == 'sst')
                {
                    $this -> SharedStringCount = $this -> SharedStrings -> getAttribute('count');
+                   if (!$this -> SharedStringCount) $this -> SharedStringCount = $this -> SharedStrings -> getAttribute('uniqueCount');
                    break;
                }
            }

It's complicated (not your fix, your fix is great and it's apparently the only way to use this library to read cells that contain text). The project is clearly abandoned and there're 23 pending pull requests. There're some forks that incorporate some basic bugfixes (I'm currently trying https://github.com/virtua-network/spreadsheet-reader) but... go figure. I presume nobody wants to marry this library and keep an up-to-date fork with all the fixes.

woigl commented 3 years ago

@kAlvaro it seems there is no current maintainer assigning anyone of us as maintainers for this project. I would love to take care of it.