Closed dev-code-davis closed 6 years ago
Ok, after 2 days of relentless search, a team's devop suggested to call:
setenforce 0
which seems to have worked... Some kind of centos/redhat security feature.
I don't recommend placing ocrmypdf on a public facing web server. PDF is a complex and exploitable file format, and ocrmypdf deliberately uses all available CPU and a lot of temporary storage, and is not necessarily secure against malicious PDFs.
On Nov 21, 2017 06:50, "Gugols" notifications@github.com wrote:
Ok, after 2 days of relentless search, a team's devop suggested to call: setenforce 0 which seems to have worked... Some kind of centos/redhat security feature.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jbarlow83/OCRmyPDF/issues/201#issuecomment-346049416, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvcMyvSM0p5mwpS6yTeVd7g7hAD9khGks5s4uMYgaJpZM4Ql2zu .
@jbarlow83 It would be used in intranet where just a few selected editors will be able upload those scanned PDFs. What alternative/approach would you suggest? As to the resource usage, we could add additional server just for OCR task. Basically, we have Drupal site which uses Solr to index content. We have tackled the task of getting PDF metadata, but scanned documents still is an issue (they need to be OCRed and indexed for search purposes). I have tested a lot of OCR libraries, and to be honest - only OCRMYPDF seemed like a solid, capable solution.
It should be fine for an intranet, just not a space where people could deliberately try to break it.
Temporary storage usage is linear with the number of pages in the PDF so you can usually handle hundreds of pages before that is an issue.
On Nov 21, 2017 07:42, "Gugols" notifications@github.com wrote:
@jbarlow83 https://github.com/jbarlow83 It would be used in intranet where just a few selected editors will be able upload those scanned PDFs. What alternative/approach would you suggest? As to the resource usage, we could add additional server just for OCR task. Basically, we have Drupal site which uses Solr to index content. We have tackled the task of getting PDF metadata, but scanned documents still is an issue (they need to be OCR and indexed for search). I have tested a lot of OCR libraries, and to be honest - only OCRMYPDF seemed like a solid, capable solution.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbarlow83/OCRmyPDF/issues/201#issuecomment-346066652, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvcMw8lYvCdHNbP_BFpUI0OqRoIdh30ks5s4u92gaJpZM4Ql2zu .
I'll close the issue now since the main concern seemed to be a platform configuration issue. If you have further related questions feel free to reopen it.
Hi, basically I have created a script that launches Ocrmypdf.
When I try to call the PHP script from the server itself: php ocr.php I get the intended result.
However, when I try to open it and run from browser, I got the following permission error:
/usr/local/bin/ocrmypdf
cat /usr/local/bin/ocrmypdf
OS: Centos 7.
I'm aware that this may not be strictly OCRMYPDF related issue. But it is quite strange that I continue to get this error even when (for testing purposed) did CHMOD/CHOWN whole Python directory to more open permissions. My initial impression is that that some of those packages require higher user access?