sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899
Other
7 stars 5 forks source link

Investigate dcs as source for link targets. #139

Closed funderburkjim closed 6 months ago

funderburkjim commented 2 years ago

In a comment at #135, Oliver Hellwig's Digital Corpus of Sanskrit was suggested as a possible source for link targets for the Cologne dictionaries. This issue devoted to discussion of that possibility.

funderburkjim commented 2 years ago

Skanda purana example.

@Andhrabharati discovered this url in the dcs website for SkandaPurāṇa Revākhaṇḍa: http://www.sanskrit-linguistics.org/dcs/index.php?contents=texte&PhraseID=417840

Select a chapter : '24' This brings up what is presumably chapter 24 of Revākhaṇḍa.

different edition

This dcs version seems to be different from that which is referenced in MW. For instance,

<L>77301<pc>412,3<k1>jambukeSvaratIrTa
 ... <ls>RevāKh. xxiv.</ls>

But 'jam' is not found in the dcs page for chapter 24.

Also, the number of chapters appears different. The dcs chapters go from 1 to 232. But there are higher chapter numbers in MW references to Revākhaṇḍa. For instance,


<L>81300<pc>430,1<k1>wOweSa
... <ls>RevāKh. cccv.</ls>   (cccv = 305).
...
funderburkjim commented 2 years ago

sql data file

There is a data source for dcs http://www.sanskrit-linguistics.org/dcs/index.php mentions github repository: See https://github.com/OliverHellwig/sanskrit

In the readme file for 'dcs/data', there is 'Google Drive' file zipped dcs.zip This contains dcs.sql, which is a huge (350MB) file which looks like a dump of a mysql database. For instance, the chapter 24 text in the dcs website starts with

śrīmārkaṇḍeya uvāca / (1.1)
saṅgamaḥ karanarmadayoḥ pure māndhātṛsaṃjñite / (1.2)

and this text appears on line 414228-9 of the dcs.sql file:

(403038,3905,'śrīmārkaṇḍeya uvāca /',1,1),
(403039,3905,'saṅgamaḥ karanarmadayoḥ pure māndhātṛsaṃjñite /',1,2),

It would be possible to extract from dcs.sql the text for all the chapters of Revākhaṇḍa. And from this extraction to contain a link target for Revākhaṇḍa (of course, this would still leave the version difference mentioned above.

Andhrabharati commented 2 years ago

In a comment at #135, Oliver Hellwig's Digital Corpus of Sanskrit was suggested as a possible source for link targets for the Cologne dictionaries.

My intention was just to point to the SkandaP. Revakh. at DCS as a 'reliable' source (for changing the MW instances), not to use the DCS for linking purpose. It's textual data is mostly from the GRETIL, which unfortunately is with plenty of errors still.

The BEST way is already being followed now, linking the citations to the PDF pages (of course, excepting RV and AV).

Also, the number of chapters appears different. The dcs chapters go from 1 to 232. But there are higher chapter numbers in MW references to Revākhaṇḍa.

We can probably look at these smaller count references in future [I can sure be of some help in the process], after 'handling' the larger count ones first.

gasyoun commented 2 years ago

not to use the DCS for linking purpose. It's textual data is mostly from the GRETIL, which unfortunately is with plenty of errors still.

Around 10% dirty.

The BEST way is already being followed now, linking the citations to the PDF pages

No, not best. Just because we have not yet made TXT versions of those PDFs.

We can probably look at these smaller count references in future [I can sure be of some help in the process], after 'handling' the larger count ones first.

Good to hear.

In the readme file for 'dcs/data', there is 'Google Drive' file zipped dcs.zip

@funderburkjim the SQL file contains itself plenty of mistakes. We've made our own edition fixing most of them.

Andhrabharati commented 2 years ago

The BEST way is already being followed now, linking the citations to the PDF pages

No, not best. Just because we have not yet made TXT versions of those PDFs.

@gasyoun As you yourself were saying that (most of) these are "outdated", there is absolutely no possibility that anyone EVER would try to make the text files of these PDFs.

People might make the text for the PDFs (if at all), which mostly are being used in the present times or are the only available sources.

Andhrabharati commented 6 months ago

@funderburkjim

As I guess, this issue is closable.