A colleague discovered that ooxml_util_pkg.get_xlsx_cell_value returns NULL if some the contents of the cell have formatting applied to them. This does not happen if the cell itself has formatting applied. For example, a cell containing "foobar" (with only "bar" in italics) will return NULL.
I think that's because line 404 of ooxml_util_pkg uses this XPath expression to extract the text node from sharedStrings.xml:
I'm not sure what the correct behavior would be in this case. It seems to be a user would most likely expect the concatenated text inside all descendant <t> elements inside the <si>, even if there was a intervening <r> element.
Agree that returning the concatenated plaintext value of all nodes seems the most correct thing to do in this case. What would be the correct XPath expression to do that? Suggestions wanted :-)
A colleague discovered that ooxml_util_pkg.get_xlsx_cell_value returns NULL if some the contents of the cell have formatting applied to them. This does not happen if the cell itself has formatting applied. For example, a cell containing "foobar" (with only "bar" in italics) will return NULL.
I think that's because line 404 of ooxml_util_pkg uses this XPath expression to extract the text node from
sharedStrings.xml
:So it's looking for a
<t>
element that is a direct descendant of an<si>
element.The relevant part of the
sharedStrings.xml
file for the above example lists this string as:I'm not sure what the correct behavior would be in this case. It seems to be a user would most likely expect the concatenated text inside all descendant
<t>
elements inside the<si>
, even if there was a intervening<r>
element.