harshankur / officeParser

A Node.js library to parse text out of any office file. Currently supports docx, pptx, xlsx and odt, odp, ods..
MIT License
123 stars 17 forks source link

Your file officeParserTemp/tempfiles/x.xlsx seems to be corrupted. #27

Closed ladrians closed 6 months ago

ladrians commented 7 months ago

I get this error trying to get the text from the attachment

Your file officeParserTemp/tempfiles/170654016408700001.xlsx seems to be corrupted. If you are sure it is fine, please create a ticket in Issues on github with the file to reproduce error.

office_parser_error01.xlsx

Could validate the xlsx is a valid file, I can see all the information at least, now sure why I am getting the associated error. thanks in advance, regards

harshankur commented 7 months ago

I need to investigate this further. I have noticed that this file has a value in its sheet1.xml that does not seem to be correct. It could be a problem on my understanding as well. But the code is working properly.

Did you create this file? I need to re-export it from MS Excel and try parsing it again to see if the error persists.

ladrians commented 7 months ago

No, from the original I tried to remove sensitive data, reproduced the case and is what I attached, I have no idea what problem may have the xml content... if it is possible to analyze it, generate a warning and continue would be helpful.

ladrians commented 6 months ago

just to know if you have any advance on this strange case?

harshankur commented 6 months ago

just to know if you have any advance on this strange case?

@ladrians Yes, I did. It was difficult to debug it but I found the bug. I think yours would not be an isolated case. This bug would have occurred in many forms created on Excel as Excel chooses to store empty strings as an inline string in your case for some reason which I was not ignoring and therefore I was trying to access an element in an array that did not exist.

Feel free to reopen this issue if you find this bug recurring or causing any side effects.

Lastly, thank you for helping to improve this project. I appreciate it.

ladrians commented 6 months ago

great thanks!