Open nleroy917 opened 1 month ago
@isaac-d-cohen let me know if you have thoughts. Would love contribution! I'm winging all of this. Big rust noob
Thanks for opening this issue! Yeah, I agree that it should find text anywhere on the slides. I also found an example of a slide with text directly on it that the text extraction feature doesn't find: test4.pptx
I don't know how to go about this though. I'm an even bigger noob, having just learned Rust this spring semester in one of my classes. But I would guess we need to find out where in a PPTX text can legally be located. It seems really daunting though.
It seems really daunting though.
yeah the open-xml specification is absurd. The powerpoint extractor would probably have to read the actual documentation for PresentationML to really figure it all out.
Realistically, the extractor will just have to be incrementally updated as the crate gets updated to parse it better and better
The PowerPoint file text extraction leaves a lot to be desired. It's a little over simplified and doesn't find text that isn't directly in the
ppt/slides/ directory
. Should it do this?