Open gasyoun opened 3 months ago
I don't think that's worth spending our time on at CDSL; getting the links to scan pages itself is a big task and Jim has taken up the same with some support from my side.
That OCRing is best left to the interested people (if any!!).
I really doubt if anyone would venture the task and complete even a single book; people are just "making use" of the texts provided by open sources like GRETIL, Sanskritdocuments etc. (with whatever quality/drawbacks that they possess). No further improvement, nor any independent work!!
And I recall that not even a single step has been taken (at your end, @gasyoun) for "getting" the text out of the front pages of the CDSL works [which is a very practical & achievable task] that was talked about few years back!!
I don't think that's worth spending our time on at CDSL
Disagree. Would want to discuss it with @martingluckman at a later stage.
Encouraging to see that this feature of links to references is noticed by Gluckman.
@gasyoun I agree with AB that OCRing (getting the text out of) the Documentation Frontmatter scans would be an upgrade to that section at Cologne.
Encouraging to see that this feature of links to references is noticed by Gluckman.
@funderburkjim
What Marcis said is that Harry Spier had noticed the linking feature, not Gluckman (whom Marcis wants to approach for helping in OCRing the full-works!)
if this is an ongoing project to make all the references in the B-R Grosse Worterbuch live
Yes, at least for the 'major' PWG references.
I think I've put all of the 'link targets' here: https://github.com/orgs/sanskrit-lexicon-scans/repositories
This repo also contains copies of the scanned images for the dictionaries.
So someone interested in OCRing any of the link targets could clone one of these repos to get images of the individual pages.
In fact, These github repos are also used by cdsl displays (e.g. of PWG) to 'serve' the images.
Just OCRing can be done practically in no-time these days (courtesy Google); but it is the next phase, i.e. proofing the OCRed text to match the print is the REAL task.
if this is an ongoing project to make all the references in the B-R Grosse Worterbuch live
Yes, at least for the 'major' PWG references.
Is it not worthy to do this for all the works that exceed a count of 10k (references), in this spree?
And @funderburkjim should update the lsextract_pwg file (which seems to have been last updated on 13th Jan. 2023) again, which will have further members (extending the list that I mentioned at the KSS issue) joining the 10k+ club!
--------------------------------------
PS. If the Skt. lexicons are also to be covered, I can prepare 'the index files' for those as well (taking Jim's indexing for AK. as "done").
And also link the Indische Sprüche (1st ed.) scans, though the 2nd ed. has been already linked as a digital text.
I don't think that's worth spending our time on at CDSL
Disagree. Would want to discuss it with @martingluckman at a later stage.
I am sure Jim cannot spend any time for this, and I WILL NOT (though I can do the proofing also, iff I take up the work); so you are welcome to get it done by any interested party, @gasyoun !!
@Andhrabharati I'm speaking of a dirty OCR, nonproofed
proofing the OCRed text to match the print is the REAL task.
A simple script will do it, @gasyoun!
[And quite many of them are floating across the net.]
Looks like Suśruta, 1835-6 is the only other candidate coming into the 10k+ club!
Once this 'bound book" is split into two constituent volumes [Vol.1 (1835): 378pp and Vol.2 (1836): 562pp, leaving the front 4 "title" pages in each volume], there is no need for any indexing for this work-- as the references are just in the (volume,page,line) manner.
Very easy for Jim, just like in the case of the Verz. d. Oxf. H.!!
[And quite many of them are floating across the net.]
Never seen one @Andhrabharati
Looks like Suśruta, 1835-6 is the only other candidate coming into the 10k+ club!
Where are the others?
[And quite many of them are floating across the net.]
Never seen one @Andhrabharati
Well, not everyone need to know everything! You may just use the places like wikisource, ocr.sanskritdictionary.com, ambuda.org etc.
Looks like Suśruta, 1835-6 is the only other candidate coming into the 10k+ club!
Where are the others?
You mean the list of names? Look at my post above! If it is about the scans, they would come when Jim starts working for them!
@funderburkjim @Andhrabharati the work is started to be noticed! And so I can a question if we can batch get an OCR of the scans on our end with https://ocr.sanskritdictionary.com and with a little help from @martingluckman
"Does anyone know if this is an ongoing project to make all the references in the B-R Grosse Worterbuch live (i.e. point to the actual page of the work referenced). and if this project also extends to other of the Koln on-line dictionaries." - what is the plan and at what URL as of now? What is already covered? Even I miss part of the changelog.