mortenbra / alexandria-plsql-utils

Oracle PL/SQL Utility Library
786 stars 317 forks source link

ooxml_util_pkg.get_xlsx_cell_value returns NULL if there is inter-cell formatting #6

Closed eaolson closed 9 years ago

eaolson commented 9 years ago

A colleague discovered that ooxml_util_pkg.get_xlsx_cell_value returns NULL if some the contents of the cell have formatting applied to them. This does not happen if the cell itself has formatting applied. For example, a cell containing "foobar" (with only "bar" in italics) will return NULL.

I think that's because line 404 of ooxml_util_pkg uses this XPath expression to extract the text node from sharedStrings.xml:

'/sst/si[' || (l_string_index + 1) || ']/t/text()'

So it's looking for a <t> element that is a direct descendant of an <si> element.

The relevant part of the sharedStrings.xml file for the above example lists this string as:

<si>
    <r>
        <t>foo</t>
    </r>
    <r>
        <rPr>
            <i/>
            <sz val="11"/>
            <color theme="1"/>
            <rFont val="Calibri"/>
            <family val="2"/>
            <scheme val="minor"/>
        </rPr>
        <t>bar</t>
    </r>
</si>

I'm not sure what the correct behavior would be in this case. It seems to be a user would most likely expect the concatenated text inside all descendant <t> elements inside the <si>, even if there was a intervening <r> element.

mortenbra commented 9 years ago

Agree that returning the concatenated plaintext value of all nodes seems the most correct thing to do in this case. What would be the correct XPath expression to do that? Suggestions wanted :-)

mortenbra commented 9 years ago

Fixed via commit 957c4ad