sanskrit-lexicon / csl-websanlexicon

1 stars 1 forks source link

Links for AV and RV in MW #21

Open funderburkjim opened 3 years ago

funderburkjim commented 3 years ago

While working with adding links to Rg Veda and Atharva Veda to displays of Grassman dictionary, I gave a try to doing the same for MW(99) dictionary.

This turned out to be fairly simple, so such links are now available for MW.

What do you think? Good idea? Implementation details ok?

Will explain further in comments below.

funderburkjim commented 3 years ago

Example

For word cIti in MW: image

Click on the AV link and you get (in another tab):

image

Similarly, specific references to Rg Veda also have live links to pages in the rvlinks repository.

funderburkjim commented 3 years ago

Technical note

The references in MW for RV are in form RV. mandala,hymn,verse where mandala is a lower-case Roman numeral from 1 to 10.

Similarly, the references in MW for AV are in form AV. mandala,hymn,verse where mandala is a lower-case Roman numeral from 1 to 20.

Thus, once we convert the Roman numeral to Arabic numeral, it is easy to construct the link to rvlinks or avlinks repository hymns.

All the work is done in basicadjust.php and in two places:

This two attributes of this gralink tag are in turn used by basicdisplay.php to generate the final html of the display: <a href="https://sanskrit-lexicon.github.io/rvlinks/rvhymns/rv10.051.html#rv10.051.03" title="Rg Veda 10.051.03" target="_rvlink">RV. x, 51, 3</a>

gasyoun commented 3 years ago

This turned out to be fairly simple, so such links are now available for MW.

Mesmerizing it is to see what we spoke just in December about. @drdhaval2785 dāruṇa dāruṇa and (Uṇ. iii, 53 ) should not the Uṇ. be as simple to link to as well?

What do you think?

As they say it in Britain: brilliant.

Good idea?

No, even better.

Implementation details ok?

Nothing to add as far as I can see now.

funderburkjim commented 3 years ago

Currently, the only 'targets' we have are RV, AV, and Panini sutras.

When we get linkable targets for other references, we can generate links to them.

gasyoun commented 3 years ago

linkable targets for other references, we can generate links to them

Is there a Python who can count all the stats for abbreviations at once? If I have the common ones, I could hunt for such URLs after.

gasyoun commented 3 years ago

For all Uṇ., like (Uṇ. iii, 53 ) we can have @funderburkjim, agree? https://ashtadhyayi.com/unaadi/ 1 URL, no anchors, just one single page.

funderburkjim commented 3 years ago

count all the stats for abbreviations at once?

This gist has some stats.

The statls.txt file is a 'tsv' (tab-separated-values) file. Sorting by first column shows the following with 1000+ 'linkable' references:

1087     5329       Hariv.  Harivaṃśa
 1528    7090         BhP.  Bhāgavata-purāṇa
 1547    5601         ŚBr.  Śatapatha-brāhmaṇa
 1697     230      Dhātup.  Dhātupāṭha
 1699    9266           R.  Rāmāyaṇa
 2131    5100          AV.  Atharva-veda
 3512    3586          Mn.  Manu-smṛti
 5104   23423         MBh.  Mahābhārata
 6330    9971          RV.  Ṛg-veda
 8095     678         Pāṇ.  Pāṇini

We have link targets for AV., RV. , Pāṇ..

Also, currently Dhātup. is treated in a separate way that gives links to Westergaard for verbs. See next comment for example.

funderburkjim commented 3 years ago

Westergaard links

The Dhātup. references in MW are associated with Westergaard Dhatupatha. Example:

<L>46864<pc>266,1<k1>kas<k2>kas<h>2<e>1
<hom>2.</hom> <s>kas</s> ¦ <s>kaste</s> <ab>v.l.</ab> for <s>kaMs</s>, <s>kaMste</s>, <ls>Dhātup. xxiv, 14.</ls><info westergaard="kasi,24.14,02.0017"/>
<LEND>

image

Clicking the 20.30 link gives

image

funderburkjim commented 3 years ago

Based on the sorted list above, the most important is MBh. However, there may be unresolved linking issues due to differences between the version of Mahabharata referenced by MW and current versions of MBh (such as Smith's version).

funderburkjim commented 3 years ago

https://ashtadhyayi.com/unaadi/ 1 URL, no anchors, just one single page.

Acc. to stats above, 770 205 Uṇ. Uṇādi-sūtra, so 770 +205 = 975 non-specific links to that page.

Technically possible. @drdhaval2785 Is this a good solution for Uṇādi-sūtra ?

gasyoun commented 3 years ago

unresolved linking issues due to differences between the version of Mahabharata referenced by MW and current versions of MBh (such as Smith's version).

Smith is of no use for us, it is totally different. I still do not have a good solution for MBh, at http://samskrtam.ru/parallel-corpus/mahabharata.html I have studied all the files of http://mahabharata.manipal.edu/#/ (cleaner version of Smith), but still totally different.

Thanks for https://gist.github.com/funderburkjim/6932b0c089fa45ba31de08ff12432644 let me play with it.

gasyoun commented 3 years ago

deva-tama , RV. iv, 22, 3 &c.; f. devi-tamā, ii, 41, 16 )

Is there a regex way, @funderburkjim to know that a ii, 41, 16 that comes after RV. iv, 22, 3 is a RV. ii, 41, 16?

dsadasasdadsads

funderburkjim commented 3 years ago

No.

This question is similar to the Hariv. 2227 and 12360 question of #23.

Unfortunately, MW uses many forms of abbreviation; this makes us have to use many markup tricks if we want to maintain the text of MW but increase the linkability.

In your RV example, the current markup is: <ls>RV. iv, 22, 3</ls> &c.; ... <ls>ii, 41, 16</ls>.

We would need to extend our markup conventions. Perhaps to <ls>RV. iv, 22, 3</ls> &c.; ... <ls n="RV. ">ii, 41, 16</ls>.

This markup preserves the text, by putting a useful piece of information in an attribute value.

basicadjust.php could likely be modified to also generate an href to rvlinks for this new form (<ls n="RV. ">ii, 41, 16</ls>).

The addition of the markup itself would need to be done 'manually'; a regex filter might be able to help in identifying most places where similar markup changes might be made.

funderburkjim commented 3 years ago

An experiment as above with local installation shows the above is feasible. The image shows that we have desired tooltip; AND the link works.

image

funderburkjim commented 3 years ago

There may be 100 or so such examples for RV. in MW.

One filter that gets at some is: 149 matches for "<ls>RV[.].*<ls>[ivx]+," in buffer: mw.txt temp_rvlinks_partial.txt

funderburkjim commented 3 years ago

I haven't installed the basicadjust.php code that generated above example. In case we decide it is worth using, am saving a copy here:

basicadjust.php.txt

funderburkjim commented 3 years ago

A similar issue is found in RV links in PWG:

In pwg, there are about 19000 links to RV. regex=<ls>ṚV. (with space after period) Of these, about 13000 are single references, such as <ls>ṚV. 3, 45, 4.</ls>;

12853 matches in 12852 lines for "<ls>ṚV. [0-9]+, [0-9]+, [0-9]+[.]</ls>" in buffer: pwg.txt

Non-single references can be complex to interpret: <ls>ṚV. 1, 64, 4. 5, 54, 11.</ls> maybe means RV 1.64.4 and RV 5.54.11 But how to intepret this one?: <ls>ṚV. 1, 46, 10. 91, 17. 125, 3. 7, 98, 1. 8, 61, 2. 9, 62, 4. 67, 28. 68, 4. 74, 5.</ls>

gasyoun commented 3 years ago

100 or so such examples for RV. in MW.

Oh, so not as many as I proposed, but still.

In case we decide it is worth using

It sure is.

In pwg, there are about 19000 links to RV.

Now that's a lot and low-hanging fruit.

ṚV. 1, 64, 4. 5, 54, 11. maybe means RV 1.64.4 and RV 5.54.11

Yes it does.

ṚV. 1, 46, 10. 91, 17. 125, 3. 7, 98, 1. 8, 61, 2. 9, 62, 4. 67, 28. 68, 4. 74, 5.

ṚV. 1, 46, 10. 91, 17. 125, 3. 7, 98, 1. 8, 61, 2. 9, 62,

  1. 67, 28. 68,
    1. 74, 5.

Ok, it can be a bit tricky.