jandrew / Spreadsheet-XLSX-Reader-LibXML

Read spreadsheet files with xlsx extentions
Other
4 stars 2 forks source link

Fatal error retrieving worksheets when one sheet is a standalone chart #32

Closed morungos closed 9 years ago

morungos commented 9 years ago

Great module, but I have a crash in the innards when I open a file which includes a worksheet that consists of only a chart. Stack backtrace is as follows:

Use of uninitialized value $fileName in string eq at /Users/stuartw/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/Archive/Zip/Archive.pm line 89.
 at /Users/stuartw/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/Archive/Zip/Archive.pm line 89.
    Archive::Zip::Archive::memberNamed(Archive::Zip::Archive=HASH(0x7f92c040b6d8), undef) called at /Users/stuartw/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/Spreadsheet/XLSX/Reader/LibXML.pm line 560
    Spreadsheet::XLSX::Reader::LibXML::_build_reader(Spreadsheet::XLSX::Reader::LibXML=HASH(0x7f92bf62f4e0), HASH(0x7f92c02ec330), Archive::Zip::Archive=HASH(0x7f92c040b6d8)) called at /Users/stuartw/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/Spreadsheet/XLSX/Reader/LibXML.pm line 280
    Spreadsheet::XLSX::Reader::LibXML::worksheet(Spreadsheet::XLSX::Reader::LibXML=HASH(0x7f92bf62f4e0)) called at /Users/stuartw/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/Spreadsheet/XLSX/Reader/LibXML.pm line 222
    Spreadsheet::XLSX::Reader::LibXML::worksheets(Spreadsheet::XLSX::Reader::LibXML=HASH(0x7f92bf62f4e0)) called at etc/load.pl line 142

Removing that sheet makes the error go away, and I can easily reproduce it by adding some data, making a chart, right-clicking the chart to move it to a new sheet, and saving. I'd attach the test chart but it won't let me, so I can email one if you need.

Tested with Excel for Mac 2011.

morungos commented 9 years ago

By the way, this is something of a blocker for me, as the same file killed the other import module I was using, probably for the same reason. So I'm happy to provide a pull request if it'll speed things.

jandrew commented 9 years ago

Stuart,

First thank you for submitting the report. I wrote the package because I also ran into blockers with the other packages. (They are still great they just aren't targeting at the corner cases). Specifically can you tell me what you are trying to read off the chart since the sub elements will not necessarily be organized tabularly?

I definitely accept pull requests but I prefer test cases over pull requests if you are prioritizing.

I will push this to the top of my stack. Judging from past experience I can't promise anything faster than 48 hours. My process is to write a test case for the fail first and then write fixes that support the full test suit before even pushing back to github. I will commit to only the elements in milestone 34.4 though before I push to CPAN.

jandrew commented 9 years ago

I will push the fix to github as it is available.

morungos commented 9 years ago

Hi Jed

I’m actually not trying to read anything off the charts! That’s the weirdest part of the issue. It’s just that their mere existence is causing the ->worksheets() method to crash out. My test file is 20k (attached) and I’ll happily put in a test case for you today. For now, it’s the attached file that demonstrates the issue: just having the second sheet causes the error, removing it and all is well.

Because these contain no cells, I’d be happy for now with either (a) skipping these sheets entirely, or (b) returning a chart sheet with a type marker. I actually select sheets by name, so I can promise never to access the chart sheets, but I can’t get the name until after ->worksheets() has succeeded.

And many many thanks for this support — looks like a great module and I’ll be switching my Excel cases to use it.

All the best Stuart

On Feb 17, 2015, at 12:06 PM, Jed notifications@github.com wrote:

Stuart,

First thank you for submitting the report. I wrote the package because I also ran into blockers with the other packages. (They are still great they just aren't targeting at the corner cases). Specifically can you tell me what you are trying to read off the chart since the sub elements will not necessarily be organized tabularly?

I definitely accept pull requests but I prefer test cases over pull requests if you are prioritizing.

I will push this to the top of my stack. Judging from past experience I can't promise anything faster than 48 hours. My process is to write a test case for the fail first and then write fixes that support the full test suit before even pushing back to github. I will commit to only the elements in milestone 34.4 though before I push to CPAN.

— Reply to this email directly or view it on GitHub.

jandrew commented 9 years ago

Stuart,

OK for now I will see about returning a worksheet object with no rows and columns and min max row / column as the same number for the table sheet.

Best Regards,

Jed

morungos commented 9 years ago

I forked the module and added a test file and test case which seems to show the problem: https://github.com/morungos/Spreadsheet-XLSX-Reader-LibXML.

Actually, since your API allows retrieval of sheets by name, I tried using the ->worksheet($name) method instead, hoping that would avoid the issue. It seems not, sadly.

morungos commented 9 years ago

I've temporarily switched back to use my old module, and without the charts, it's working okay, so this isn't a blocker now. (The old module was way way worse with the charts present). I still want to use this one and I'm happy to help get it into shape.

jandrew commented 9 years ago

Stuart,

Thanks for the test case. I am pretty sure that the failure occurs in the initial opening of the workbook and that the chart sheet just creates some logic that my package can't handle when it sets itself up. The fix is probably something as simple as providing some internal definitions to allow for the case you have identified. Optimistically I should have something real soon but it will need to wait for the bus ride home tonight before I can apply any real focus and I don't want to overcommit since there are sometimes unintended consequences.

Best Regards,

Jed

morungos commented 9 years ago

Thanks Jed

I just commented on GH that without the charts, for now I can drop back to the old module. It’s noisier, horrid with dates, and I do want to switch, but I’m no longer blocked, so feel free to prioritize accordingly.

I’ve done my share of poking at MS XML internals, and my guess is when the XML is unpacked, chart sheets will just be very different and obviously so.

All the best — and many thanks; I’d still be stuck if it wasn’t for your code Stuart

On Feb 17, 2015, at 12:43 PM, Jed notifications@github.com wrote:

Stuart,

Thanks for the test case. I am pretty sure that the failure occurs in the initial opening of the workbook and that the chart sheet just creates some logic that my package can't handle when it sets itself up. The fix is probably something as simple as providing some internal definitions to allow for the case you have identified. Optimistically I should have something real soon but it will need to wait for the bus ride home tonight before I can apply any real focus and I don't want to overcommit since there are sometimes unintended consequences.

Best Regards,

Jed

— Reply to this email directly or view it on GitHub.

jandrew commented 9 years ago

After digging into this it appears that the path of trying to treat chart tabs and worksheet tabs the same is a possible black hole. (an maybe bad architecture) I have opened two enhancement issues #33 and #34 to deal with a feature improvement that would make chart tabs readable but for now I settled on making the initial worksheet load smart enough to tell the difference between worksheets and charts. This means that all worksheet level functions will only work on tabular sheets. At this point I am passing all tests including the one you submitted (thanks). Once I have updated the POD to explain the issue and changes I will push to github. I would like to finish the other open issue in milestone v0.34.4 before I push to CPAN but that should be tomorrow morning latest.

jandrew commented 9 years ago

I should know better but when I write documentation it always make me question some of my architecture. (after an extended debate with myself) I want to take one more stab at including charts in the worksheet list. (delaying a github and CPAN push till tomorrow.) sorry.