Closed fickleZZZ closed 1 year ago
Hi thanks for your report. If you already determined that this is a problem from lxml, what can we do about it? it's probably more efficient if you report the bug there
Closing for now, please ping to reopen if the bug is in pandas and not lxml
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
With the provided Reproducible Example, and with lxml>=4.7.1 installed
using to_excel(), index 30 of column CN will always be missing value on export.
this is original value (index 30 in df) '什么鬼哦,妈的这是在搞我吧?穿越不给主角就算了,居然给反派?还他娘是个公共线反派!你给我个支线的都行啊!'
but this is what get saved to xlsx (row 32 in xlsx) '什么鬼哦,妈的这是在搞我吧?穿越不给主角就算了,居然给反派?还他娘是个公共线反派!你给我个支线'
This works fine if I uninstall lxml or use lxml version 4.6.5 or lower
Note: I have tried reducing data as much as I could, even removing a single more line would not produce this unexpected behavior, i have also tried altering values of other supposedly irrelevant column& row, but that change would not produce this unexpected behavior. It's not exactly a sensitive data though.
Expected Behavior
The value on index 30 (or row 32 in excel) of column CN is '什么鬼哦,妈的这是在搞我吧?穿越不给主角就算了,居然给反派?还他娘是个公共线反派!你给我个支线的都行啊!'
but after exporting to excel with to_excel() and reading again, the value becomes '什么鬼哦,妈的这是在搞我吧?穿越不给主角就算了,居然给反派?还他娘是个公共线反派!你给我个支线'
that is, '的都行啊!' at the end of string is missing
note that this only happen if lxml version 4.7.1 or above is installed
Installed Versions