tshrinivasan / OCR4wikisource

OCR for WikiSource using Google Drive OCR
GNU General Public License v2.0
33 stars 24 forks source link

Purge the index file after OCR is completed #74

Open bodhisattwawiki opened 8 years ago

bodhisattwawiki commented 8 years ago

It would be great if the script can purge the index file after OCR is completed. Users often forget to purge it as they are not doing the OCR manually. It is needed to update the list of index pages.

tshrinivasan commented 8 years ago

Give more details with examples.

What do you mean by purge index page?

Why we have to do that?

Regards, T.Shrinivasan

My Life with GNU/Linux : http://goinggnu.wordpress.com Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com

Get Free Tamil Ebooks for Android, iOS, Kindle, Computer : http://FreeTamilEbooks.com

bodhisattwawiki commented 8 years ago

Purging is needed to update the status of index file. All Wikisources have a list of Index pages, where we can get the updated status of Index pages. (For example, in Bengali Wikisource, https://bn.wikisource.org/w/index.php?title=%E0%A6%AC%E0%A6%BF%E0%A6%B6%E0%A7%87%E0%A6%B7:IndexPages&limit=500&offset=0&key=&order= ) If we dont purge the Index page after OCR, it remains white in stead of red colour, so there is a chance that the same OCR can be done twice by two users. #56

tshrinivasan commented 8 years ago

Can any one give an example for this with tamil or english wiki source index page examples?

ravidreams commented 8 years ago

Example Index:

https://bn.wikisource.org/wiki/%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A7%8D%E0%A6%98%E0%A6%A3%E0%A7%8D%E0%A6%9F:%E0%A6%AA%E0%A6%B2%E0%A7%8D%E0%A6%B2%E0%A7%80-%E0%A6%B8%E0%A6%AE%E0%A6%BE%E0%A6%9C.djvu

Example purge URL:

https://commons.wikimedia.org/wiki/File:%E0%A6%AA%E0%A6%B2%E0%A7%8D%E0%A6%B2%E0%A7%80-%E0%A6%B8%E0%A6%AE%E0%A6%BE%E0%A6%9C.djvu?action=purge

If you visit the index page, in the top right corner there are three icons. Second icon is for purge. Just need to add ?action=purge to the Index URL and ping it.

But, please note that in many other languages including Tamil we are freshly creating index files. As we already thought of limiting this tool to OCR related functions only, I didn't want to keep adding features like (creating index files). But, hope this purge ping will work without the need for creating index files first.

bodhisattwawiki commented 8 years ago

Thats why I said that it is better to purge after OCR is completed. By then, you already will have created index pages.

ravidreams commented 8 years ago

//Thats why I said that it is better to purge after OCR is completed. By then, you already will have created index pages.//

We create index pages in batches sometimes after many files are OCRed and pages uploaded. Not necessarily during page upload process.

bodhisattwawiki commented 8 years ago

@ravidreams , thats unconventional. I dont know any other community doing like this. ;-) Other Wikisource Communities including Bengali create index page first and then go for OCR.

ravidreams commented 8 years ago

@BodhisattwaMandal Well, it is because, we didn't have a coordinated effort for taws so far. People have been uploading classic text available in web that was proofread already. Not a single book proofread so far :) You noticed that we had very few pdf books in Tamil uploaded before this tool came.

bodhisattwawiki commented 8 years ago

Ok, purging wont create new indexes. It only purges already created index pages.

tshrinivasan commented 8 years ago

Do we need this purge option still? @ravidreams

Is all other wikisource communities doing purge after OCR is done?

bodhisattwawiki commented 8 years ago

All other big Wikisource communities has specific bots to purge the index pages. Besides, their OCR method is different from ours. Our method is unique and it requires purging after OCR. By the way we do have js for soft and hard purging. It might help to make this easier for you. https://bn.wikisource.org/s/805

tshrinivasan commented 8 years ago

Hmm. Can not understood still about what is purge and how to do it diagrammatically.

Will explore about and comment here later.

jayantanth commented 8 years ago

This is my personal opinion regarding this issue which is not directly related with this script. There are so many bots running from Tool Server where we can set this purge action every 1 or 2 hr. User:Wikitanvir already run this from tool server. So apparently I can say that it can be close.