Closed tha-uzhavan closed 8 years ago
Other Wikisource including Bengali creates Index pages right after the file is uploaded and after that we go for OCR. This is the standard procedure. What Tamil is doing is not standard procedure. As @Ravidreams said earlier, Tamil is not creating Index pages first. They are doing the OCR and then they are creating Index pages. I think, what @tha-uzhavan is asking, is a Tamil specific need amd not at all general need. So, I do not think, the script needs this feature at all.
Can anyone explain the workflow after uploading a pdf to commons?
How the file in commons is displayed in wiki source with index?
I heard about the proofread extension. but like to know how the index pages are generated in other languages and why it is not there in tamil wikisource?
Can anyone explain the workflow after uploading a pdf to commons?
The workflow is like this - 1) Upload file to Commons 2) Create index page in Wikisource 3) Check whether the file has missing pages, duplicate pages, disoriented pages and give page numbers at the Index page 4) Start OCR 5) Proofread 6) Validate
Some times at step 3. we use to do offline in our own PC before uploading to commons.
You can find more detailed at https://en.wikisource.org/wiki/Help:Beginner's_guide_to_proofreading
I agree with Bodhi, My point is Index making should be automate. For example, first, the following infos are enough to create a index page. https://ta.wikisource.org/w/index.php?title=Index:%E0%AE%AA%E0%AF%86%E0%AE%B0%E0%AE%BF%E0%AE%AF_%E0%AE%AA%E0%AF%81%E0%AE%B0%E0%AE%BE%E0%AE%A3%E0%AE%AE%E0%AF%8D_%E0%AE%93%E0%AE%B0%E0%AF%8D_%E0%AE%86%E0%AE%AF%E0%AF%8D%E0%AE%B5%E0%AF%81-2.pdf&action=edit then other infos from the commons description page of the book if available. we are going to do by outreach programme. so, Index making and maintanace can be done by automation but through simple steps. I think most of the Indian wikisource projects are doing manually. Of course, it is good but in the future maintenance automation is best.
It is better to keep this tool focused only on OCR to keep the suited for standard Wikisource practices.
However, there is a need to automate index creation when we do bulk file uploads. So, this feature can be added to the pdf upload tool. Reported here - https://github.com/tshrinivasan/tools-for-wiki/issues/12
Created index maker for all files in a given category https://github.com/tshrinivasan/tools-for-wiki/tree/master/index-maker
Before upload the text to wikisource, Index making is necessary for the notes by the uploading person. For example, see the index https://ta.wikisource.org/s/spz . From the 35th page, page rotation is needed. And also to patrol the uploading. URL notes Page:உமர்_கயாம்_வாழ்வும்_இலக்கியமும்.pdf/35 (for a page) Index:{{PAGENAME}} (for a book)