Links wrongly rendered due to space between the digits #130

Closed Andhrabharati closed 2 years ago

Andhrabharati commented 2 years ago


See the entry "akratu", as an example-


The link is going to RV.x.8,3, which no way is connected to akratu word. The link should go to RV.x,83,5 instead; by removing the space between 8 and 3 in the mw.txt

Noticed ~100 such cases, that need correction in the digitisation.

gasyoun commented 2 years ago

removing the space between 8 and 3 in the mw.txt

Interesting, never noticed before.

funderburkjim commented 2 years ago

@Andhrabharati If you have a list of these, please provide, or else provide the search regex(es) you use. In a first look, I'm only finding 10 or so similar to your example above.

10 matches for "RV. [xvi]+, [0-9]+ [0-9]+, [0-9]" in buffer: mw.txt
Andhrabharati commented 2 years ago

I looked for "space between digits" [0-9] [0-9], not just for RV link cases.

Andhrabharati commented 2 years ago

Here is the extracted search result- space between digits.txt

Incidentally there are some <pc> lines as well in this!

Andhrabharati commented 2 years ago

Also there are quite many cases where a number is outside the <ls>...</ls> tag, which needs to be within the tag.

I used the regex </ls>[;\.,:] [0-9] to get them.

Andhrabharati commented 2 years ago


just checked that PWG also has about 90 cases of "space between digits".

funderburkjim commented 2 years ago

@Andhrabharati Thanks for alerting me of these problems. Will attend to them.

gasyoun commented 2 years ago


Empty source, first tima I see such an effect.

gasyoun commented 2 years ago

@funderburkjim can't figure out what's wrong here



Andhrabharati commented 2 years ago

He seems to have looked ONLY for the number pattern "Roman, IA, IA".

If the punctuation mark or space is different in that "block", he is not taking it as the 'Rigveda link'.

funderburkjim commented 2 years ago

The current markup and link logic works for verses only; e.g. <ls>RV. x, 10, 5 for verse 5 of hymn x,10.

The yama example, by contrast, should be interpreted as a reference to two hymns (Rv. x 10 AND RV x 14), with no verse specified.

the supposed author of <ls>RV. x, 10; 14</ls>, of a hymn to <s1 slp1="vizRu">Viṣṇu</s1> and of a law-book;

Perhaps the display program (basicadjust.php) can be extended to generate links for examples like <ls>RV. x, 10</ls>.

The other aspect of this yama example is that two references implied, and that the semicolon (the semicolon between 10 and 14) is used, in MW, to separate the two references.

A search for semicolons within RV references in mw.txt yields:

623 matches for "<ls>RV\. [xiv]+,[^<]*;" in buffer: mw.txt

All of these need to be examined and recoded where possible so that multiple links will be available. For instance, changes such as the following are desireable:

<ls>RV. i, 139, 1; iv, 44, 5.</ls>
<ls>RV. i, 139, 1</ls>; <ls n="RV">iv, 44, 5.</ls>

Work will be carried out with an aim to improve the markup and display in these dimensions.

gasyoun commented 2 years ago

Work will be carried out with an aim to improve the markup and display in these dimensions.

I give you my thanks.


Is a lot and not at the same time. I see them a lot!

funderburkjim commented 2 years ago

The RV ls markup improvements mentioned above have been completed in MW. The work is done in mwauthorities/ls/20220628-rv.

Most of the changes are as predicted in the above comment. But several were typos with errors in spacing or punctuation. And a small number of errors involved homonym markup. For instance:

; <L>100473<pc>512,3<k1>Darmakft
; <ls>RV. viii.87, 1.2.</ls>
; <ls>RV. viii.87, 1.2.</ls>    <<< That 2.  should be the homonym number of 'next' entry
338230 old <hom>3.</hom> <s>Da/rma</s> ¦ in <ab>comp.</ab> for <s>°man</s> <ab>q.v.</ab> 2.   <<< DROP the 2.
338230 new <hom>3.</hom> <s>Da/rma</s> ¦ in <ab>comp.</ab> for <s>°man</s> <ab>q.v.</ab>
;  CHANGE the <h> value in next entry
338232 old <L>100473<pc>512,3<k1>Darmakft<k2>Da/rma—kft<h>b<e>3
338232 new <L>100473<pc>512,3<k1>Darmakft<k2>Da/rma—kft<h>2<e>3
; and similarly remark as hom 2.
338233 old <s>Da/rma—kft</s> <hom>b</hom> ¦ <lex>m.</lex> maintainer of order 
(<s1 slp1="indra">Indra</s1>), <ls>RV. viii.87, 1.2.</ls><info lex="m"/>  <<<< ALSO Drop this 2.
338233 new <hom>2.</hom> <s>Da/rma—kft</s> ¦ <lex>m.</lex> maintainer of order (<s1 slp1="indra">Indra</s1>), <ls>RV. viii, 87, 1.</ls><info lex="m"/>
; and do similar change for Darmavat: b change to 2
338235 old <L>100474<pc>512,3<k1>Darmavat<k2>Da/rma—vat<h>b<e>3
338235 new <L>100474<pc>512,3<k1>Darmavat<k2>Da/rma—vat<h>2<e>3
; change hom markup
338236 old <s>Da/rma—vat</s> <hom>b</hom> ¦ (<s>Da/rma</s>) <lex>mfn.</lex> accompanied by <s1 slp1="Darman">Dharman</s1> or the law (<s1 slp1="aSvin">Aśvin</s1>s), <ls>viii, 35, 13.</ls><info lex="m:f:n"/>
338236 new <hom>2.</hom> <s>Da/rma—vat</s> (<s>Da/rma</s>) 
<lex>mfn.</lex> accompanied by <s1 slp1="Darman">Dharman</s1> or the law (<s1 slp1="aSvin">Aśvin</s1>s), <ls n="RV.">viii, 35, 13.</ls><info lex="m:f:n"/>
;; other homonyms of Darmakft and Darmavat
336540 old <L>99961<pc>510,3<k1>Darmakft<k2>Da/rma—kft<h>a<e>3
336540 new <L>99961<pc>510,3<k1>Darmakft<k2>Da/rma—kft<h>1<e>3
336541 old <s>Da/rma—kft</s> <hom>a</hom> ¦ <lex>mfn.</lex> 
(2. See under 3. <s>Darma</s>) doing one's duty, virtuous, <ls>MBh.</ls><info lex="m:f:n"/>
336541 new <hom>1.</hom> <s>Da/rma—kft</s> ¦ <lex>mfn.</lex> 
(<hom>2.</hom> See under <hom>3.</hom> <s>Darma</s>) doing one's duty, virtuous, 
<ls>MBh.</ls><info lex="m:f:n"/>
337425 old <L>100234<pc>511,3<k1>Darmavat<k2>Da/rma—vat<h>a<e>3
337425 new <L>100234<pc>511,3<k1>Darmavat<k2>Da/rma—vat<h>1<e>3
337426 old <s>Da/rma—vat</s> <hom>a</hom> ¦ <lex>mfn.</lex> (2. See under 3. 
<s>Darma</s>) virtuous, pious, just, <ls>L.</ls><info lex="m:f:n"/>
337426 new <hom>1.</hom> <s>Da/rma—vat</s> ¦ <lex>mfn.</lex> (<hom>2.</hom> 
See under <hom>3.</hom> <s>Darma</s>) virtuous, pious, just, <ls>L.</ls><info lex="m:f:n"/>
funderburkjim commented 2 years ago

links for 2-parameter references

There are many mentions of hymns, with no verse specified, such as <ls>RV. viii, 13</ls> in

356994 new <s>nA/rada</s> ¦ <lex>m.</lex> or <s>nArada/</s> <ab>N.</ab> of a 
<s1 slp1="fzi">Ṛṣi</s1> (a <s1 slp1="kARva">Kāṇva</s1> or 
<s1 slp1="kASyapa">Kāśyapa</s1>, 
author of <ls>RV. viii, 13</ls>; <ls n="RV.">ix, 104</ls>; <ls n="RV. ix,">105</ls>; 

The basicadjust.php component of the displays is now adjusted so that this 2-parameter reference generates a link to first verse of the hymn.

funderburkjim commented 2 years ago

subtle errors still unresolved

Here is an example of a likely markup error, just noticed by accident. Under pragATa image

The markup is

<ls>RV. viii, 1, 2</ls>; 
<ls n="RV. viii, 1,">10</ls>; 
<ls n="RV. viii, 1,">48</ls>; 
<ls n="RV. viii, 1,">51</ls>-
<ls n="RV. viii, 1,">54</ls>

The markup looks consistent with the printed text, but it can't be right, since there is no verse 48 (or 51 or 54) in hymn 'viii, 1'. Maybe the markup should be hymns 1, 2, 10, 48, 51, and 54 of mandala viii ?

<ls>RV. viii, 1</ls>, <ls n="RV. viii,">2</ls>; 
<ls n="RV. viii,">10</ls>; 
<ls n="RV. viii,">48</ls>; 
<ls n="RV. viii,">51</ls>-
<ls n="RV. viii,">54</ls>

No doubt there are other similar problematic markups to identify and alter.

funderburkjim commented 2 years ago

Similar review of other links in MW

Other ls abbreviations in MW with link targets should be reviewed in a manner similar to the above review of RV link. Such as AV., P.,

funderburkjim commented 2 years ago

spacing issues

I think the spacing issues should have been handled. The specific cases

gasyoun commented 2 years ago

The basicadjust.php component of the displays is now adjusted so that this 2-parameter reference generates a link to first verse of the hymn.

Hurray! A badly needed one around all the dictionaries and targets available.

Andhrabharati commented 2 years ago

Quite a few of such RV links to the Marcis's version (which is presently being used) would not be helpful, as those links do not give any clue about the meaning/intent in the MW.

All such should be linked to some other source, as I had proposed elsewhere recently.

gasyoun commented 2 years ago

All such should be linked to some other source, as I had proposed elsewhere recently.

Did not get why.

would not be helpful, as those links do not give any clue about the meaning/intent in the MW.

What do you mean?