Open jayantanth opened 8 years ago
+1 for running in the cloud. Will also solve storage, bandwidth and disconnection issues.
Do we get shell access in tools server?
Or do we need any web interface to to hosted in tools server?
How to ask for access? On 28 Dec 2015 16:14, "ravidreams" notifications@github.com wrote:
+1 for running in the cloud. Will also solve storage, bandwidth and disconnection issues.
— Reply to this email directly or view it on GitHub https://github.com/tshrinivasan/OCR4wikisource/issues/7#issuecomment-167596564 .
Hi Shrini, please go through at http://tools.wmflabs.org/
follow the step one by one
Useful links
Tools project page on wikitech (find out more about the Tools project)
Create a Labs account (you must have a Labs account to access the Tools project)
Add a public SSH key (you’ll need this to access Labs servers using SSH)
Request access to the Tools project (Join us!)
Create New Tool
Source code repository of this web
On Facebook chat I was mentioned you, that English Wikisource use at https://tools.wmflabs.org/phetools/hocr_cgi.py , the user PHE ( Philippe Elie) maintain this tool and all his script can be found here https://github.com/phil-el/phetools, this user mostly active in french Wikisource, here is his users page https://fr.wikisource.org/wiki/Utilisateur:Phe
Full help can be found at https://wikitech.wikimedia.org/wiki/Help:Tool_Labs and https://wikitech.wikimedia.org/wiki/Help:Access
+1 Agree with Jayanta's input.
+1 Totally agree, it will solve the bandwidth issue
We need a web version of OCR4Wikisource to run on tools server.
Looking for volunteers to make a web version.
@tshrinivasan I'm not very familiar with the operation of OCR4wikisource, but could http://tools.wmflabs.org/ws-google-ocr/ be modified to help you?
@samwilson Thanks for the link. The tool you mentioned is for single image.
But in OCR4Wikisource, we can give the URL of a full PDF from commons. It downloads the pdf, splits into single pages, uploads to google drive, download as text, paste the content to relevant wikisource proofread page.
Looking for a web version. https://github.com/tshrinivasan/OCR4wikisource/issues/89
@samwilson tried to run a file with it for Kannada (kn)
Image size of https://upload.wikimedia.org/wikipedia/commons/4/46/%E0%B2%95%E0%B2%A8%E0%B3%8D%E0%B2%A8%E0%B2%A1_%E0%B2%AD%E0%B2%A4%E0%B3%83%E0%B2%B9%E0%B2%B0%E0%B2%BF_%E0%B2%B8%E0%B3%81%E0%B2%AD%E0%B2%BE%E0%B2%B7%E0%B2%BF%E0%B2%A4.djvu (11421144) exceeds permitted size (4194304)
Looks like there is some limit on the memory usage. Please check.
The Vision API is limited to 4 MB per image.
I'm replying with some other thoughts in https://phabricator.wikimedia.org/T120788
Niharika, Rohit and Psychoslave started working on this during Wikimania 2016 hackathon, but no update after that. https://tools.wmflabs.org/?tool=ocr4wikisource
Hi Shrini,
This is a proposal to run this script from http://tools.wmflabs.org, so it will be OS independent.