Closed pyZerrenner closed 1 year ago
I can confirm that this bug exists on the main branch 2.1.0
.
Thanks for the report! Further investigations and PRs to fix are welcome!
take
take
Maybe just skip office:annotation?
diff --git a/pandas/io/excel/_odfreader.py b/pandas/io/excel/_odfreader.py
index 277f64f636..48677468c7 100644
--- a/pandas/io/excel/_odfreader.py
+++ b/pandas/io/excel/_odfreader.py
@@ -206,7 +206,11 @@ class ODFReader(BaseExcelReader["OpenDocument"]):
cell_value = cell.attributes.get((OFFICENS, "value"))
return float(cell_value)
elif cell_type == "string":
return self._get_cell_string_value(cell)
elif cell_type == "currency":
cell_value = cell.attributes.get((OFFICENS, "value"))
return float(cell_value)
@@ -228,8 +232,10 @@ class ODFReader(BaseExcelReader["OpenDocument"]):
"""
from odf.element import Element
from odf.namespaces import TEXTNS
+ from odf.office import Annotation
from odf.text import S
+ office_annotation = Annotation().qname
text_s = S().qname
value = []
@@ -239,6 +245,8 @@ class ODFReader(BaseExcelReader["OpenDocument"]):
if fragment.qname == text_s:
spaces = int(fragment.attributes.get((TEXTNS, "c"), 1))
value.append(" " * spaces)
+ elif fragment.qname == office_annotation:
+ continue
else:
# recursive impl needed in case of nested fragments
# with multiple spaces
Or extract only text:p/text:s
(full list of possible elements here).
@dimastbk I tried these changes and it looks good, you should open a MR :)
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Here is the example *.ods file for the example code: TableWithComment.ods. The file was create using LibreOffice Calc 7.4.7.2
The
print(df)
command produces the following outputIf a cell in the *.ods file has a comment and the cell content is a string (B1 and B7), the comment text and timestamp are appended in front of the cell content. This also applies to the header line. For cells containing numbers, the comment is ignored (A6 and B10).
(Note, that I am referring to comments inserted using LibreOffice Calc itself. This is unrelated to the
comment
argument ofread_excel
.)Expected Behavior
The ods-comments should be ignored and only the cell content read into the dataframe. The expected output from
print(df)
is(This is the output when all comments in the *.ods file are deleted)
Installed Versions