tshrinivasan / OCR4wikisource

OCR for WikiSource using Google Drive OCR
GNU General Public License v2.0
33 stars 24 forks source link

One page missed #25

Closed bodhisattwawiki closed 8 years ago

bodhisattwawiki commented 8 years ago

Jayantada just used the script গীতি-কবিতা.pdf (https://bn.wikisource.org/wiki/%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A7%8D%E0%A6%98%E0%A6%A3%E0%A7%8D%E0%A6%9F:%E0%A6%97%E0%A7%80%E0%A6%A4%E0%A6%BF-%E0%A6%95%E0%A6%AC%E0%A6%BF%E0%A6%A4%E0%A6%BE.pdf) in Bengali Wikisource. Upto 23 pages, it was fine. But from page No. 24 (https://bn.wikisource.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%A4%E0%A6%BE:%E0%A6%97%E0%A7%80%E0%A6%A4%E0%A6%BF-%E0%A6%95%E0%A6%AC%E0%A6%BF%E0%A6%A4%E0%A6%BE.pdf/%E0%A7%A8%E0%A7%AA) onwards, the OCRed text is from the next page (https://bn.wikisource.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%A4%E0%A6%BE:%E0%A6%97%E0%A7%80%E0%A6%A4%E0%A6%BF-%E0%A6%95%E0%A6%AC%E0%A6%BF%E0%A6%A4%E0%A6%BE.pdf/%E0%A7%A8%E0%A7%AB)

Besides it has OCRed 70 pages in stead of 71 pages of the file. (https://bn.wikisource.org/wiki/%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A7%8D%E0%A6%98%E0%A6%A3%E0%A7%8D%E0%A6%9F:%E0%A6%97%E0%A7%80%E0%A6%A4%E0%A6%BF-%E0%A6%95%E0%A6%AC%E0%A6%BF%E0%A6%A4%E0%A6%BE.pdf)

jayantanth commented 8 years ago

OK I shall run it again!

jayantanth commented 8 years ago

Run again. Now 71 pages created, uploaded ocred. And watched that during mediawiki_uploader.py, at Bengali Wikisource , the ocred text file start at page number 24. This is nice!

mediawiki_uploader_2016-01-06-09-27-17_log.txt

So please closed this issue.