Open NBEng opened 3 years ago
If you are able to provide a fairly minimal spreadsheet file to reproduce the issue (I only need the one that fails, if possible a single worksheet with just a small number of rows) that would be really helpful. (Also, permission to use this reduced spreadsheet in the test suite for this module, so after fixing it I can prevent a regression, would be appreciated.)
Test-SSheet-ODS-Converted-to-XLSX.xlsx
I've uploaded the test spreadsheet. I'm new to GitHub, so I hope that worked.
The spreadsheet is very small and simple, consisting of 1 sheet with 2 columns (11 rows entries per column). Use it as you see fit for your testing.
The version of LibreOffice that I am using is as follows:
Version: 6.4.7.2 (x64)
Build ID: 639b8ac485750d5696d7590a72ef1b496725cfb5
CPU threads: 12; OS: Windows 10.0 Build 19042; UI render: GL; VCL: win;
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
The raku program that fails consists of just a few lines (from my initial attempt of using the XLSX module):
use Spreadsheet::XLSX;
my $workbook = Spreadsheet::XLSX.load('Test-SSheet-ODS-Converted-to-XLSX.xlsx');
I called this program from the shell as follows (showing the result as well):
nick@patsan10:~/perl/raku$ raku ./test-XLSX.pl6
Type check failed in binding to parameter '$entry'; expected LibXML::Element but got LibXML::Text (LibXML::Text.new)
in method from-xml at /home/nick/rakudo/share/perl6/site/sources/91EE9DC1F76171735695739510695D6E067BA7E9 (Spreadsheet::XLSX::ContentTypes) line 33
in block at /home/nick/rakudo/share/perl6/site/sources/788A745784D5EA4EF68F82C35BF0689F82D90701 (Spreadsheet::XLSX) line 49
in submethod TWEAK at /home/nick/rakudo/share/perl6/site/sources/788A745784D5EA4EF68F82C35BF0689F82D90701 (Spreadsheet::XLSX) line 45
in method load at /home/nick/rakudo/share/perl6/site/sources/788A745784D5EA4EF68F82C35BF0689F82D90701 (Spreadsheet::XLSX) line 39
in method load at /home/nick/rakudo/share/perl6/site/sources/788A745784D5EA4EF68F82C35BF0689F82D90701 (Spreadsheet::XLSX) line 30
in method load at /home/nick/rakudo/share/perl6/site/sources/788A745784D5EA4EF68F82C35BF0689F82D90701 (Spreadsheet::XLSX) line 25
in block <unit> at ./test-XLSX.pl6 line 3
The version of raku I am running is:
nick@patsan10:~/perl/raku$ raku --version
Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2020.10.
Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
Built on MoarVM version 2020.10.
I installed this from the rakudo star bundle on my Debian 10 Linux VM a few days ago (in a personal subdirectory of my $HOME).
I installed Spreadsheet::XLSX using zef shortly after installing raku. Here's the zef info' on the module:
nick@patsan10:~/perl/raku$ zef -v info Spreadsheet::XLSX
===> Updating cpan mirror: https://raw.githubusercontent.com/ugexe/Perl6-ecosystems/master/cpan1.json
===> Updating p6c mirror: https://raw.githubusercontent.com/ugexe/Perl6-ecosystems/master/p6c1.json
===> Updated p6c mirror: https://raw.githubusercontent.com/ugexe/Perl6-ecosystems/master/p6c1.json
===> Updated cpan mirror: https://raw.githubusercontent.com/ugexe/Perl6-ecosystems/master/cpan1.json
- Info for: Spreadsheet::XLSX
- Identity: Spreadsheet::XLSX:ver<0.2.2>:auth<Edument>
- Recommended By: Zef::Repository::Ecosystems<p6c>
- Installed: Yes
Description: Work with Excel (XLSX) spreadsheets.
License: Artistic-2.0
Source-url: https://github.com/jnthn/spreadsheet-xlsx.git
Provides: 11 modules
-----------------------------------------------------------------------
Module |Path-Name
-----------------------------------------------------------------------
Spreadsheet::XLSX |lib/Spreadsheet/XLSX.pm6
Spreadsheet::XLSX::Cell |lib/Spreadsheet/XLSX/Cell.pm6
Spreadsheet::XLSX::Root |lib/Spreadsheet/XLSX/Root.pm6
Spreadsheet::XLSX::Styles |lib/Spreadsheet/XLSX/Styles.pm6
Spreadsheet::XLSX::Workbook |lib/Spreadsheet/XLSX/Workbook.pm6
Spreadsheet::XLSX::CellStyle |lib/Spreadsheet/XLSX/CellStyle.pm6
Spreadsheet::XLSX::Worksheet |lib/Spreadsheet/XLSX/Worksheet.pm6
Spreadsheet::XLSX::Exceptions |lib/Spreadsheet/XLSX/Exceptions.pm6
Spreadsheet::XLSX::ContentTypes |lib/Spreadsheet/XLSX/ContentTypes.pm6
Spreadsheet::XLSX::Relationships|lib/Spreadsheet/XLSX/Relationships.pm6
Spreadsheet::XLSX::SharedStrings|lib/Spreadsheet/XLSX/SharedStrings.pm6
-----------------------------------------------------------------------
Depends: 2 items
------------------------
ID|Identity |Installed?
------------------------
1 |Libarchive|✓
1 |LibXML |✓
------------------------
Thank you for looking into this.
Nick
I also just found this problem with 'LibreOffice Calc'-saved .xlsx files, and have been trying to figure what the problem is for a couple of days ( I'm old, and slow :-) ).
I think the culprit is white space (newlines, etc) between the xml tags, here's my reasoning ....
my $xml = qq:to/END/;
<?xml version='1.0' encoding='UTF-8' standalone="yes" ?>
<book edition="2">
<title>Training Your Pet Ferret</title>
<authors>
<author>Gerry Bucsis</author>
<author>Barbara Somerville</author>
</authors>
<isbn>9780764142239</isbn>
<dimensions width="162.56mm" height="195.58mm" depth="10.16mm" pages="96" />
</book>
END
my LibXML::Document $doc .= parse(:string($xml));
my LibXML::Element $root = $doc.documentElement();
for $root.childNodes -> LibXML::Element $entry {
say $entry.^name;
say $entry.values;
}
# output :
# Type check failed in binding to parameter '$entry'; expected LibXML::Element but got LibXML::Text (LibXML::Text.new)
for $root.childNodes -> $entry { # removed LibXML::Element type constraint for testing
say $entry.^name;
say $entry.values;
}
# output :
# LibXML::Text # every other node is an empty LibXML::Text object, they are white space in the xml (newlines, etc)
# ()
# LibXML::Element
# (Training Your Pet Ferret)
# LibXML::Text
# ()
# LibXML::Element
# (
# <author>Gerry Bucsis</author>
# <author>Barbara Somerville</author>
# )
# LibXML::Text
# ()
# LibXML::Element
# (9780764142239)
# LibXML::Text
# ()
# LibXML::Element
# ()
# LibXML::Text
# ()
# these empty LibXML::Text nodes are preserved by default in LibXML::Document
# though there is an option to drop them by setting :blanks(False) in the parse step
my LibXML::Document $doc-no-blanks .= parse(:string($xml), :!blanks);
my LibXML::Element $root-no-blanks = $doc-no-blanks.documentElement();
say $root-no-blanks;
# output :
# <book edition="2"><title>Training Your Pet Ferret</title><authors><author>Gerry Bucsis</author><author>Barbara Somerville</author></authors><isbn>9780764142239</isbn><dimensions width="162.56mm" height="195.58mm" depth="10.16mm" pages="96"/></book>
for $root-no-blanks.childNodes -> LibXML::Element $entry {
say $entry.^name;
say $entry.values;
}
# output :
# LibXML::Element
# (Training Your Pet Ferret)
# LibXML::Element
# (<author>Gerry Bucsis</author> <author>Barbara Somerville</author>)
# LibXML::Element
# (9780764142239)
# LibXML::Element
# ()
I will do some more testing, but atm I'm thinking Excel-created .xlsx files don't have tags separated by white space, whereas Calc-created .xlsx files do have some.
As a follow up to my previous comment :
I found out how to temporarily change an installed module, for testing purposes (ugexe post on SO)
linux:~/> zef look Spreadsheet::XLSX ===> Searching for: Spreadsheet::XLSX ===> Shelling into directory: /home/bazzaar/.zef/tmp/spreadsheet-xlsx.git/6d078cb053a1ef484d5fe6ad619392a2865e1187
Setting :!blanks in the parse step helped just for two of the xml files in the Calc-saved xslx archive.
# here's what I changed ....
linux:~/.zef/tmp/spreadsheet-xlsx.git/6d078cb053a1ef484d5fe6ad619392a2865e1187> cd lib/Spreadsheet/XLSX
linux:~/.zef/tmp/spreadsheet-xlsx.git/6d078cb053a1ef484d5fe6ad619392a2865e1187/lib/Spreadsheet/XLSX> egrep -i 'parse' *
ContentTypes.pm6: #| Parse the XML content of a [Content_Types].xml.
ContentTypes.pm6: my LibXML::Document $doc .= parse(:string($xml), :!blanks); # <----- edited, otherwise Type check failure (see below)
Relationships.pm6: #| Parse the XML content of a relationships file.
Relationships.pm6: my LibXML::Document $doc .= parse(:string($xml), :!blanks); # <----- edited, otherwise Type check failure (see below)
SharedStrings.pm6: my LibXML::Document $doc .= parse(:string($xml));
Workbook.pm6: #| Parse the XML content of a workbook file.
Workbook.pm6: my LibXML::Document $doc .= parse(:string($xml));
Worksheet.pm6: #| sparse.
Worksheet.pm6: my LibXML::Document $doc .= parse(:string(.decode('utf-8')));
linux:~/.zef/tmp/spreadsheet-xlsx.git/6d078cb053a1ef484d5fe6ad619392a2865e1187/lib/Spreadsheet/XLSX> cd ../../..
linux:~/.zef/tmp/spreadsheet-xlsx.git/6d078cb053a1ef484d5fe6ad619392a2865e1187> zef test .
===> Testing: Spreadsheet::XLSX:ver<0.2.3>:auth<Edument>
===> Testing [OK] for Spreadsheet::XLSX:ver<0.2.3>:auth<Edument>
linux:~/.zef/tmp/spreadsheet-xlsx.git/6d078cb053a1ef484d5fe6ad619392a2865e1187> zef install . --force-install
===> Testing: Spreadsheet::XLSX:ver<0.2.3>:auth<Edument>
===> Testing [OK] for Spreadsheet::XLSX:ver<0.2.3>:auth<Edument>
===> Installing: Spreadsheet::XLSX:ver<0.2.3>:auth<Edument>
############## The above changes avoided the following two Type check failures
linux:~/Documents> raku read_xlsx_spreadsheet.raku
Type check failed in binding to parameter '$entry'; expected LibXML::Element but got LibXML::Text (LibXML::Text.new)
in method from-xml at /home/bazzaar/.raku/sources/E8B4F94D4C76B1DD712835D14C49422CFBA5B3C7 (Spreadsheet::XLSX::ContentTypes) line 33
in block at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 49
in submethod TWEAK at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 45
in method load at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 39
in method load at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 30
in method load at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 25
in block <unit> at read_xlsx_spreadsheet.raku line 5
@linux:~/Documents> raku read_xlsx_spreadsheet.raku
Type check failed in binding to parameter '$entry'; expected LibXML::Element but got LibXML::Text (LibXML::Text.new)
in method from-xml at /home/bazzaar/.raku/sources/BE03566AACA318811AD47E7D3B9046E7A3FF3395 (Spreadsheet::XLSX::Relationships) line 43
in method find-relationships at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 112
in submethod TWEAK at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 57
in method load at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 39
in method load at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 30
in method load at /home/bazzaar/.raku/sources/6B95368FA292C8C986B317BAD740379DEB60E17C (Spreadsheet::XLSX) line 25
in block <unit> at read_xlsx_spreadsheet.raku line 5
##############
# with some python in jupyter notebook I was able to take a closer look at the various archive versions
Sure enough there is an embedded "\n" in front of the last tag in these two files ( [Content_Types].xml, and xl/_rels/workbook.xml.rels ), that isn't present in the Excel 'equivalents'
However, even though the edits allow the Calc-saved xlsx archive to begin to be read, a further more significant error then ensues :
linux:~/Documents/> raku read_xlsx_spreadsheet.raku
Workbook has 3 sheets
(Data 1 Data 2 Data 3)
Missing attribute 'spans' on 'row'
in sub get-attribute at /home/bazzaar/.raku/sources/76F75F9A22D1864EEA46984ABF358978110E71F6 (Spreadsheet::XLSX::Worksheet) line 439
in method maybe-load-from-backing at /home/bazzaar/.raku/sources/76F75F9A22D1864EEA46984ABF358978110E71F6 (Spreadsheet::XLSX::Worksheet) line 58
in method AT-POS at /home/bazzaar/.raku/sources/76F75F9A22D1864EEA46984ABF358978110E71F6 (Spreadsheet::XLSX::Worksheet) line 47
in block <unit> at read_xlsx_spreadsheet.raku line 17
^^ That PR does enough to get the attached XLS file open. I think the underlying issue is that speadsheet has more blank nodes. Changing a few for $node.childNodes -> my LibXML::Element $ e{ ...}
to for $node.elements -> my LibXML::Element $ e{ ...}
helps.
I also recommend upgrading to latest LibML 0.9.10, so that $e..childNodes -> my LibXML::Element $e {}
now throws a typecheck error, rather than hanging, if the element contains text nodes.
(underlying issue is that .raku
is haning on LibXML::Node objects, which I've worked around for now).
In my first attempt to get familiar with the Spreadsheet::XLSX module, I started off with the following minimalist code:
The test spreadsheet I used was actually a LibreOffice Calc spreadsheet that I saved-as an .xlsx file as I don't have Excel on this computer. Unfortunately, this didn't work and resulted in the following:
Is this a problem with the converted LibreOffice Calc file, or is there something I am doing wrong?
UPDATE: I created another spreadsheet with EXCEL on another computer and tried the short program out on the EXCEL-created file and the .load statement worked. EXCEL is able to read the LibreOffice-created file, but the Spreadsheet::XLSX.load statement has problems with it.
I can upload both spreadsheet files if that would help in determining why this occurs.
Nick