Closed weilandia closed 5 years ago
Thanks for the contribution! Do you mind adding a quick spec similar to https://github.com/pzaich/doc_ripper/blob/master/spec/doc_ripper/formats/sketch_ripper_spec.rb#L16 Thanks!
👍
@weilandia I'm getting test failures locally.
Versions:
ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-darwin17]
pdftotext version 3.03
For example, it looks like a line break is missing.
expected: "A Simple PDF File\nThis is a small demonstration .pdf file just for use in the Virtual Mechanics tut.... And more text. And more text.\nBoring. More, a little more text. The end, and just as well.\n\n\f"
got: "A Simple PDF File\nThis is a small demonstration .pdf file just for use in the Virtual Mechanics tut...t. And more text. And more text. Boring. More, a little more text. The end, and just as well.\n\n\f"
This library is not intended to maintain overall document structure -- it's just about extracting the raw text, so if you want to normalize the output in your tests to remove extra whitespace, that works for me.
@pzaich should be good to go now -- I was running an older version of pdftotext
. Decided to just strip the whitespace in the test.
Currently, when I run
DocRipper::rip("path_to_file.pdf")
thepdftotext
command writes the results to a file ("path_to_file.txt"
) and an empty string is returned. This change makes it so the result is returned and @text has a value.