Closed nleroy917 closed 4 weeks ago
An idea is to just push a blank space for each docx_rs::RunChild::Text
the code encounters:
match child {
docx_rs::RunChild::Text(text) => {
document_text.push_str(&text.text);
document_text.push(' '); // push a space for new lines
},
_ => todo!(),
}
Seems to help:
I wonder if it makes sense to just put a new line instead of a space?
I'll revert to a \n
to try to retain the formatting of the original document
I think that the
.docx
parser isn't correctly extracting text when its separated by a newline. For example:This will result in:
This is some text hereHere is some new text
Ideally, it should just be a space (or preserve that new line).