sanskrit-lexicon / GRA

Grassman Wörterbuch zum Rig Veda
1 stars 0 forks source link

Unlinked Numbered References #5

Closed gasyoun closed 1 year ago

gasyoun commented 4 years ago
ātman [p= 0175]ātmán, m. [Cu. 588]. Die griechischen Formen (greek) u. s. w. zeigen, dass ātmán aus *avatmán zusammengezogen ist und auf *av = vā, wehen, zurückgeht. Die Grundbedeutung 1) Hauch tritt mit der ausdrücklichen Parallele vā́ta klar hervor (34,7; 603,2; 994,4; 918,13); mit ihr in naher Berührung steht 2) Athem, Odem, Lebenshauch; weiter 3) Lebensgeist, Lebensprincip, auch 4) vom Geiste der Krankheit (yákṣmasya) wird es einmal gebraucht (923,11); 5) der lebendige Leib, als Einheit aufgefasst.

34,7 and 923,11 should be linked, but are not. @funderburkjim any clue?

923

funderburkjim commented 4 years ago

This problem now corrected. It affects the display in about 500 lines.

The way the links come about in the display is that they are added at run time (basicadjust.php). The program looks for a regex. This regex before the correction: ([0-9]+)[ ,]+([0-9]+) Note match starts with space ' ' The regex after the correction: ([ (])([0-9]+)[ ,]+([0-9]+) Now match starts with space OR '('.

In your examples, the missing links started with '(' and no space; that's why they were missed. Now the '(' is accepted as valid prior character, so the links are present in the display.

image

gasyoun commented 4 years ago

It affects the display in about 500 lines.

Thanks.

gasyoun commented 4 years ago

A few more.

soma [p= 1579]

soma, m., Soma, Saft der Somapflanze

    1. 10; ; 675. 6;

5610

funderburkjim commented 4 years ago

the '4. 5. 6.' cannot readily be linked. These are different verses of the prior 464,1 link, so the absence of links for 4. etc is not serious.

Notice 483.2. 3 is similar, the 3 is not linked.

gasyoun commented 4 years ago

the '4. 5. 6.' cannot readily be linked. These are different verses of the prior 464,1 link, so the absence of links for 4. etc is not serious.

That I understand. Harder case. Wonder if VedaWeb hanlded them - but can't find it there. Search they have works worse than ours. Can we count how many unlinked numbers are there?

funderburkjim commented 4 years ago

the '675. 6' link can be corrected. The link doesn't show due to the period '.' which should be a comma. In this case, the period is a typo.

Here is the regex that the display module uses to detect link patterns: |([ (])([0-9]+)[ ,]+([0-9]+)| In summary:

So ' 123.4' does NOT match, but ' 123, 4' does match.

There are several similar cases ( regex=[0-9][0-9][0-9][.] [0-9]) .

gra:10160,soma;616,ap;1582,Ayus;2097,UDar;4544,draviRa;4715,DIra; 5079,nu; 5085,nf; 5382,paSu; 5549,pur; 5905,praTama; 6268,brahman; 6401,Buvana; 8890,SUra; 10036,suzwuti; 10132,sfj

These have been corrected

Some of these corrections are classified as print changes, since the scanned image shows a period. See gra_printchange.txt.

funderburkjim commented 4 years ago

Can we count how many unlinked numbers are there?

I filtered gra.txt (86000+ lines) . Got matches in 2300 lines. First few lines of result (in Emacs): image

The highlighted areas match the regex. In 548,12. 5 we'll get a link at 548,12, but not a link at the ending 5.

gasyoun commented 4 years ago

Some of these corrections are classified as print changes, since the scanned image shows a period.

Thanks, a good catch.

Got matches in 2300 lines.

So at least 2300 unlinked cases, interesting.

Andhrabharati commented 1 year ago

I've tagged all such numbers still present in the CDSL version with { }; a simple script by Jim is enough to add new ls entities or to extend ('pad') the existing ls entries suitably.

As such, this issue could be closed once he takes up my file.