Closed gasyoun closed 7 years ago
"make the word a better place to live" - A pun or a typo?
I am currently working on aspects of (2) "Clean errors in file' for two dictionaries: Vacaspatya and Schmidt.
In the case of Schmidt, I will try to document the process, via video tutorials and uploads to the Github repository https://github.com/sanskrit-lexicon/SCH. This documentation may allow others to see where the process may be altered in a way that facilitates the crowd-sourcing of error detection.
In the case of Vacaspatya, it is a fact that there are about 4000 places in the digitization prepared by Thomas Malten's staff that were marked as 'unreadable'. The printing quality of the paper version of the text is poor, and in these cases was indecipherable to the digitizers, so they marked it as {??}. Peter Scharf has working for him in India for a few months a Sanskrit PhD graduate named Sampada Savardekar, and has said she can spend some time filling in these indecipherable places in vcp (presumably, many of the missing pieces will be obvious to someone fluent in Sanskrit).
My part in the process is to develop a good work flow, or interface, with which Sampada can make the corrections and communicate the corrections back to me so I may incorporate the corrections into vcp.txt.
I suggest that we create a VCP repository under sanskrit-lexicon. The initial goal of this repository will be to provide a 'crowd-sourceable' technique for correcting the VCP digitization errors, starting with the indecipherable parts. To provide a starting basis for discussion, I can post there the initial 'interface' I come up with for Sampada in the next day or two.
By focusing on these two dictionaries (Schmidt and Vacaspatya), we may both improve the current digitizations of those two dictionaries, and also develop a process by which scholars can contribute to the improvement of the other Cologne Sanskrit-lexicon digitizations. This task should be doable in the next several months.
All dictionaries digitized and innumerable corrections made. Time to close.
We are now at the first step now in 2014. In the middle of second step for MW, which is there for 18 years around as a file and full of junk text or false markup. It seems to me that we are still 20-70 years away to have a few clean Sanskrit dictionary files. And about 100 years away to get a new printed Sanskrit dictionary. Poona dictionary will be finished in 700 years, so anyway it will happen hopefully before it get's printed. The stages are as following: 1) Draft of digital file of printed book 2) Clean errors in file, error origin - digital 3) Clean errors in file, error origin - book 4) Implement "Nachträge und Verbesserungen", now as separate articles 5) Implement "Additions" from additional sources (like Schmidt), now as separate articles. Right now the text is faithful to the book as much as possible. But each book has hundreds of errors. To make a list of them - is a task for 5-10 years as it's done now. Jim can't do it on his own, nor can Peter or Thomas. It's time for crowd-sourcing. But for that we need to make the platform collaboration friendly and announce it. What I propose is that the real work of many involved people will start in 2015. We have 11 months to make the platform easy to understand for Indian scholars, who right now are not even aware that they can contribute. In 2015 I'm ready to tell the scholarly world what they can do to make the word a better place to live.