ropensci / roweb

:no_entry: [DEPRECATED] Active at https://github.com/ropensci/roweb2
https://legacy.ropensci.org
Other
18 stars 29 forks source link

[WIP] tesseract post #271

Closed jeroen closed 7 years ago

jeroen commented 7 years ago

Post about tesseract and OCR in general. Work in progress.

stefaniebutland commented 7 years ago

Nice clear post, especially outlining key issues. Clarifying language:

End of post:

Impact of having Tesseract in R:

stefaniebutland commented 7 years ago

Suggestion for beefing up the intro:

Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos or scanned documents. This can be useful for automated text processing of documents that are not available in digital form, such as books or articles or public records. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables X Y Z. (here give an example of a specific research project/use case that would be enabled by having tesseract in R)

People looking to extract text and metadata from pdf files in R should try our pdftools.

Then

Getting Started The package links to the ...

stefaniebutland commented 7 years ago

Please add phrase "in the example below" or "in the code below" here: "The high quality image in the code below has approximately double the resolution of the low quality image. "

On first read, it had me looking above at the wikipedia screenshot rather than below to see both files were right there

stefaniebutland commented 7 years ago

@karthik please review and give the go-ahead for the post

jeroen commented 7 years ago

OK I have updated the post with comments from Stef. I think this is good to go.

By the way next time please feel free to just commit your suggestions directly to the PR, or take the lead in the editorial process. No need to keep going back and forth with suggestions. I rather spend my time coding 🤓

karthik commented 7 years ago

@stefaniebutland I'm done with edits and pushed a few changes. Take a look and see what you think.

@jeroenooms Apologies. I was the one that suggested Stefanie provide comments and feedback inline since she's still getting up to speed with Git and this would be her first time adding directly to a PR. We'll work with her to streamline edits in the future. Onwards!