Closed funderburkjim closed 1 year ago
[litsrc2/readme.txt]
Todo: <ab n="Vorige">V.</ab> ?_{992,5}
new? <ab n="Vers">V.</ab> ?_{992,5}
I thought Vorige [= 'previous'] would fit here, for the first portion (x) for the link (z) is to be taken from the previous link (x,y) to make it (x,z) .
However, out of the total 59 places of <ab n="Vorige">V.</ab>
, 16 have it as in <ab n="Vorige">V.</ab>
, which are undoubtedly indicating "Vers".
BTW, there are three places where a number is left unmarked (all preceded by "Vers"!)--
line 9238:Liede _{124,9} den Vers 9:
line 30931: (<ab>vgl.</ab> Vers 9 und {739,5})
line 31062: [divákṣasas dhenávas vṛ́ṣṇas (agnés) áśvās in Vers 2]
These should also be marked with the "implied notation".
Some addl. corrections, mosty a missing space except at a single place (l. 78829)--
line 47519: —{@pári@} -> — {@pári@}
line 78829 : bhú{@vanā‿ánu@} -> bhúvanā‿ánu
line 81056: —{@sám@} -> — {@sám@}
-------------------
line 8522: —5〉 -> — 5〉 & —1〉 -> — 1〉
line 29648: —4〉 -> — 4〉
line 36915: —4〉 -> — 4〉
line 37803: —3〉 -> — 3〉
line 40201: —6〉 -> — 6〉
line 41229: —2〉 -> — 2〉
line 52145: —4〉 -> — 4〉
line 53146: —e〉 -> — e〉
line 60124: —2〉 -> — 2〉
line 60414: —4〉 -> — 4〉
line 61561: —17〉 -> — 17〉
line 61567: —13〉 -> — 13〉
line 71488: —2〉 -> — 2〉
line 81237: —4〉 -> — 4〉
line 81639: —2〉 -> — 2〉
line 81943: —11〉 -> — 11〉
Changes made for above two comments.
Work done in mischg directory.
41 lines changed.
Work done in althws directory. 916 extra records in gra.xml. (no extra records in temp_graab_9.txt) Example in temp_graab_9.txt
<L>1078<pc>0116<k1>arva<k2>2. arva, arvan, arvaRa
Generates two extra entries in gra.xml
<L>1078.1<pc>0116<k1>arvan<k2>2. arva, arvan, arvaRa
<L>1078.2<pc>0116<k1>arvaRa<k2>2. arva, arvan, arvaRa
This is tested in local display:
A copy of the current dev display, based on temp_graab_9, is available at url https://sanskrit-lexicon.uni-koeln.de/work/gra-dev/graab_9/web/
In limited testing, it seems to work on the Cologne server. @Andhrabharati This may help us evaluate the display of certain aspects of the current markup.
Suggestions for improvements welcome from all.
Something wrong in coding?!
Search result for the first entry 'a' is thus--
There is nothing that makes 'a' to be the althw at these 2 places indicated. [And we've no access to your (local) .xml file to see into it!!]
Having the 〉character in display seems alright, as it shows the intended markup very clearly; but, I feel the other characters '_' and '〔 〕' could be hidden from the display as they look somewhat 'odd' and extraneous.
Begin update of gra-meta2.txt. See graab/meta2 directory.
graab/change_9c .txt removes two instances of an 'invisible' u200e LEFT TO RIGHT MARK.
meta2/check_ea1.txt lists the extended ascii characters present in the digitization.
These 162 lines may be classified:
Some (all?) of these 27 may be considered 'markup' and need to be explained in gra-meta2.txt.
@Andhrabharati request you
Earlier footnote display:
Here it is not rendered as a regular footnote, but a kind of 'in-line' note, with random line-breaks!! [The sense does not immediately get into the reader's mind.]
Dev. footnote display with revised text arrangement:
Here the main text and the footnote are rendered properly (in the standard manner). But why is nur truncated and not properly wrapped to next line in here?
When looked at the numeral {…} strings for the citation no.s, came across {1066,6} which does not fall into RV numbering range. When probed, it is thus {1066,6}.a in the file data, which is GRA's internal page no. (and not the RV citation); thus, it has to be untagged and kept open as 1066,6.a.
Also found three places under L=6527 having similar marking, denoting the quartets of the hymn 850,4-- {850,4}.d
, {850,4}.b
, and {850,4}.d
All these three should include the letters b & d inside the tagging as {850,4.d}, {850,4.b}, and {850,4.d}.
There are some more numbers thus tagged as RV citations that are to be corrected--
{10,315} -> ???
{164,839} -> {164}, {839}
{164,955} -> {164}, {955}
{177,207} -> {177}, {207}
{21,101} -> {21}, {101}
dhātupāṭha ({28,103}) -> <ls>dhātupāṭha</ls>
(〔28,103〕)
{390,394} -> {390}, {394}
{60,099} -> 60,099 [plain number, "ṣaṣṭím sahásrā navatí náva"]
{90,000} -> 90,000 [plain number, "navatí sahásrā"]
* 27 The rest
Some (all?) of these 27 may be considered 'markup' and need to be explained in gra-meta2.txt.
@Andhrabharati request you
* Review check_ea1.txt. * Are there any changes to temp_graab9 that this review illuminates? * Prepare _explanations_ (to be included in gra-meta2) for 'the rest'.
〉 (\u3009) 43376 := RIGHT ANGLE BRACKET
I had marked the GRA's meaning numbers and their subsequent ref. under various word-terminations (TS) for the citations that are followed by closing parenthesis ')' as 〉 while working to match the regular (opening-closing) parenthesis pairs. Probably these could now be reverted to ')'.
⁾ (\u207e) 2 := SUPERSCRIPT RIGHT PARENTHESIS
Same reason as above, but now in case of the Footnote marking. Probably these could also be reverted to ')'.
⸗ (\u2e17) 4 := DOUBLE OBLIQUE HYPHEN
Significance not clear, but looks like they serve the same purpose as 〰 (wavy dash), to supply the word(s) under discussion--
-am ⸗ aṅgam (aṅgam-aṅgam); páruṣ ⸗ parus (paruṣ-parus) in {923,12} -āt ⸗ aṅgāt (aṅgād-aṅgād); lómnas ⸗ lomnas (lomno-lomno) in {989,6}
Or, this might possibly be indicating the 'duplication' of the word!!
〔 (\u3014) 894 := LEFT TORTOISE SHELL BRACKET 〕 (\u3015) 894 := RIGHT TORTOISE SHELL BRACKET
I had used these to distinguish the non RV/AV citation numbers, so that they are not linked to RV by mistake, as was the case in the earlier GRA [of course, it could be due to non-tagging of other works as ls-entities].
Probably all these could now be merged with the preceding ls-entities (or padded with the most nearest one, as the case may be)-- just like the AV citations-- thus eliminating any need for a spl. markup.
⁓ (\u2053) 2 := SWUNG DASH
This has been recently added as an abbr., but probably could be left unmarked, with its description in the meta2 file (like the wavy dash).
√ (\u221a) 1010 := SQUARE ROOT
I have added this to indicate the roots (√) which are printed in a different font style for easy identification (MW99 also used similar approach) and nominal verb-forms (!√).
〰 (\u3030) 5958 := WAVY DASH
Grassmann himself has indicated its purpose in the intro (Vorwort), to supply the appropriately terminated form of the word(s) under discussion. [Other works at CDSL, mostly, had the mark ° to denote the same.]
🞄 (\u1f784) 106 := BLACK SLIGHTLY SMALL CIRCLE
This is purely my 'artefact' to mark the line-break in the original file while making my file without line-breaks. This would be useful to (re)generate the CDSL style lines, if need be. [And, I rarely use this approach!] These can safely be deleted at all the 106 places now, as my file is being used 'as is'.
@Andhrabharati request you
* Review check_ea1.txt.
a) replace n̆̇ with n̐; and update counts accordingly-- ;; delete ̆ (\u0306) 0 := COMBINING BREVE ;; delete ̇ (\u0307) 0 := COMBINING DOT ABOVE ;; update ̐ (\u0310) 7 := COMBINING CANDRABINDU
Rather, I would suggest using the composite letter form for this (and others that come next), just like the other Roman & Greek letters with added diacritics (that are NOT having pre-composed forms).
;; add n̐ (\u006e\u0310) 7 := LATIN SMALL LETTER N + COMBINING CANDRABINDU ;; delete ̐ (\u0310) 7 := COMBINING CANDRABINDU
b) make the composite old slavonic letter c̣ and delete the individual diacritic--
;; add c̣ (\u0063\u0323) 2 := LATIN SMALL LETTER N C + COMBINING DOT BELOW
;; delete ̣ (\u0323) 2 := COMBINING DOT BELOW
c) make the composite hebrew letters and delete the resp. individual components ;; add דְּ (\u05d3\u05bc\u05b0) 1 := HEBREW LETTER DALET + HEBREW POINT DAGESH OR MAPIQ + HEBREW POINT SHEVA ;; add נֵ (\u05e0\u05b5) 1 := HEBREW LETTER NUN + HEBREW POINT TSERE ;; add רְ (\u05e8\u05b0) 1 := HEBREW LETTER RESH + HEBREW POINT SHEVA ;; delete ְ (\u05b0) 2 := HEBREW POINT SHEVA ;; delete ֵ (\u05b5) 1 := HEBREW POINT TSERE ;; delete ּ (\u05bc) 1 := HEBREW POINT DAGESH OR MAPIQ ;; delete ד (\u05d3) 1 := HEBREW LETTER DALET ;; delete נ (\u05e0) 1 := HEBREW LETTER NUN ;; delete ר (\u05e8) 1 := HEBREW LETTER RESH
These 162 lines may be classified:
* 52 = LATIN ... (Latin alphabet with diacritics) * 77 = GREEK ... * 6 = HEBREW ... * 27 The rest
Here are my extract counts: Roman letters: 54 Greek Letters: 77 Hebrew Letters: 3 Others: 23
And the file-- GRA non-ascii.txt
change_9a
Changes made for above two comments. Work done in mischg directory. 41 lines changed.
I tried to make the temp_graab_9 file from your temp_graab_8 file.
Found that the <ab n="Vorige">V.</ab>
instances are not changed, but only the in <ab n="Vorige">V.</ab>
instances are changed at change_9a (part 1). Needs addl. correction still.
Here is my converted file as of now-- temp_graab_9.zip
@funderburkjim
Another small correction--
change <gk>στῆ δ ̓ ὀρϑός</gk>
as <gk>στῆ δ᾽ ὀρϑός</gk>
and then update the ea1 list-- ;; delete ̓ (\u0313) 1 := COMBINING COMMA ABOVE ;; add ᾽ (\u1fbd) 1 := GREEK KORONIS
@Andhrabharati Re footnote 'nur' . Under headword 'tar'.
I think the 'nur' problem is an html/css problem, and occurs elsewhere (in other dictionaries).
The intent is to 'indent' the footnote. The means I use to indent is
<span style="position:relative; left:1.0em;">FOOTNOTE</span>
.
This does indent the FOOTNOTE, but has the (undesired) side effect of sometimes improperly wrapping text.
Maybe there is a better way to indent a block of text ?
more numbers thus tagged as RV citations
These are corrected (see change_9c). With a couple of notes:
; <L>9689<pc>1521<k1>siv
; old: <ab>Verg. Aen.</ab> {10,315}
; new: <ls>Verg. Aen.</ls> 〔10,315〕 Vergil's aeneid.
Latin text 'aerea suta' (bronze-stiched).
; <L>2995<pc>0402<k1>gur
; old: dhātupāṭha ({28,103})
; new: <ls>dhātupāṭha</ls> (〔28,103〕)
This is Westergaard's dhātupāṭha (cf. gur MW).
Note: I'll post a new version of graab_9 when all the above comments have been 'handled'.
more numbers thus tagged as RV citations
These are corrected (see change_9c). With a couple of notes:
; <L>9689<pc>1521<k1>siv ; old: <ab>Verg. Aen.</ab> {10,315} ; new: <ls>Verg. Aen.</ls> 〔10,315〕 Vergil's aeneid. Latin text 'aerea suta' (bronze-stiched). ; <L>2995<pc>0402<k1>gur ; old: dhātupāṭha ({28,103}) ; new: <ls>dhātupāṭha</ls> (〔28,103〕) This is Westergaard's dhātupāṭha (cf. gur MW).
Yes, we already have the initial cap. Dhātupāṭha in the ls list; now to update the list with these two entries.
And, pl. note that there are 3 more occurrences of dhātupāṭha (l. 33658, l. 35447 and l. 39729) to be tagged as ls-entity. [I was looking for Cap. lettered items only earlier, thus missing these!!]
Aeneid to be rendered (in the same style as others) as-- Verg. Aen. :Aeneid— Vergil :1
@Andhrabharati Question about (\u02bc) MODIFIER LETTER APOSTROPHE
.
My file has 97 of these, and yours (at link above) has 267.
Example at line 3546: I have Aditya's
(normal apostrophe) and you have Adityaʼs
(u02bc).
Example at line 7925: We both have u02bc within X in <ls ab="X">...</ls>
.
How do you wish to resolve this difference?
Initially, I had used the modifier Apostrophe in the local ls expansions only, in place of regular apostrophe (as per your 'remark'), instead of the otherwise normally used "closing single quote mark".
Finally, I thought of using this new character everywhere (esp. at the possessive cases), as it is resembling the "print character".
And, you may recall that @gasyoun has been asking for such a change since ages at multiple instances.
So, this could be the new character at all such places everywhere across all the CDSL works henceforth, except at the slp1 encoding of Devanagari avagraha. And, I guess, you would have no objection to this.
After changing (in my file) ALL apostrophes to \u02bc, I discovered that your file has 6 (normal) apostrophes. 4 of your apostrophes are in Sanskrit text (' is IAST avagraha )
two of your apostrophes are in German text:
Do you confirm these 6 normal apostrophes?
Yes, and thanks for accepting the change.
BTW, would you post the new GRA file in the ver. 3 group?
I think some more CDSL works [BHS, BEN, INM, pwk, PWG, AP90, ...] could be updated thus, with our collaborative working in the coming months. [I just need your acceptance to 'changing the line numbers count' as the first point in my working.]
63 matches in 60 lines for "<ls ab=".*?">" in buffer: temp_graab_9.txt
This markup had utility earlier during the phase of ls identification.
However, the ab attribute of ls now has no functionality.
i.e., <ls ab="X">Y</ls>
is functionally equivalent to <ls>Y</ls>
: the display programs get the
tooltip from Y.
Thus, I think we should change <ls ab="X">Y</ls>
to <ls>Y</ls>
.
@Andhrabharati agree?
No issues, @funderburkjim !
[You are the "master" in handling all such points.]
more numbers thus tagged as RV citations
These are corrected (see change_9c). With a couple of notes:
@funderburkjim
I wonder where the corrections were done!
{164,839} -> {164}, {839} {164,955} -> {164}, {955} {177,207} -> {177}, {207} {21,101} -> {21}, {101} {390,394} -> {390}, {394} {60,099} -> 60,099 [plain number, "ṣaṣṭím sahásrā navatí náva"] {90,000} -> 90,000 [plain number, "navatí sahásrā"]
I see all these mentioned places here to be 'not proper' in the web display at https://sanskrit-lexicon.uni-koeln.de/work/gra-dev/graab_9/web/webtc2/index.php
{164,839} etc. changes are reflected in my local version, but not yet posted.
Please see commit 6760970.
This should have all the changes discussed (hope I haven't missed anything).
Most work is done in graab/meta2 directory.
temp_graab_9g.zip has my latest version (temp_graab_9g.txt)
The corresponding dev display link is https://sanskrit-lexicon.uni-koeln.de/work/gra-dev/graab_9g/web/
AFAIK, the only further revision needed is removal of unneeded blank lines.
@funderburkjim
Though the <ls ab="">
strings are modified with <ls>
and enclosed the following citation number(s), their ls tooltips are not updated at some places; checked mainly the Aufrecht cases, some of which are retained as Aufrecht's edition of Rig-Veda.
See the very first such entry--
[Probably more such are existing still.]
I've tried splitting the Ku.
, Ku. Zeitschr.
, Nir.
, Pāṇ.
, Prāt.
, Vop.
and VS.
citations individually by padding the <ls n=
string. [But, the Cu.
citations are skipped from such splitting!]
"Adjusted" few ls-places clubbing together the parts enclosed in ( )
and [ ]
, like
<ls>Bollensen (O. u. O. 〔2,462〕)</ls>
<ls>J. Grimm [Ku. 〔1,82〕]</ls>
<ls>Lottner [Ku. Zeitschr. 〔7,186〕]</ls>
<ls>Max Müller (Oxford Essays 〔S. 61〕)</ls>
And removed the  
at various places, which are no more required as the cases are now uniquely identifiable with the "integrated" citation numbers.
Finally, removed the extra blank lines "remaining" in the final portion of the VN data. temp_graab_9g (AB).zip
Pl. see if they are alright and could be incorporated in your file.
Here is the full list of Kuhnʼs Zeitschr. articles (108) mentioned in GRA; most (70+) of which are not expanded earlier (having been cited with just the issue & page numbers, without the authorʼs name!). Kuhnʼs Zeitschr. articles.txt
Would you like all these to be expanded now, @funderburkjim ?
like all these to be expanded?
Not sure what this means.
It might be that the current best place for this 'expansion' is not within temp_graab_x, but as a separate reference within the 'Documentation' (https://sanskrit-lexicon.uni-koeln.de/scans/csldev/csldoc/build/dictionaries/index.html).
Please provide one or two samples of how and where this expansion would be coded
This is identical (!) to temp_graab_9g (AB).zip.
Corresponding dev displays at https://sanskrit-lexicon.uni-koeln.de/work/gra-dev/graab_10/web/
Comparison work done in 'graab/final' directory.
A few changes to the ls-tooltip file.
@Andhrabharati Are you ready to call this the FINAL dev version?
@funderburkjim
So far as the .txt file is concerned, we can call this the FINAL version. [This exercise took about a month now (started on 9th June), through multiple revisions for good reasons (mostly).]
But the ls-tooltip for Aufrecht (for the Ku. Zeitschr. articles) is still bad at two entries (aMsa and aSman)!!
Let me list full titles for all the Ku. Zeitschr. articles, either to be added as ls-tooltips or separately put at the csl-docs.
@Andhrabharati Re footnote 'nur' . Under headword 'tar'. I think the 'nur' problem is an html/css problem, and occurs elsewhere (in other dictionaries). The intent is to 'indent' the footnote. The means I use to indent is
<span style="position:relative; left:1.0em;">FOOTNOTE</span>.
This does indent the FOOTNOTE, but has the (undesired) side effect of sometimes improperly wrapping text.Maybe there is a better way to indent a block of text ?
Pl. see this (using different built-in properties) for indenting-- https://www.geeksforgeeks.org/how-to-indent-text-in-html-by-using-css/
Here is the list of articles cited from Kuhn's Journal (arranged in the "issue sequence")-- Kuhnʼs Zeitschr. articles.txt [Pl. be informed that few of these were having some mistakes and typos in my previous work.]
There are 50 unique articles in total (few being cited multiple times).
Aufrecht (for the Ku. Zeitschr. articles) is still bad at two entries ...
This has now been corrected (the graab_10 dev display link has been revised). The correction was to change normal apostrophe to (our current favorite) abnormal apostrophe in the tooltip file. There were also a couple of other similar changes. No change in the graab_10 text. The dev display has been revised.
Aufrecht (for the Ku. Zeitschr. articles) is still bad at two entries ...
This has now been corrected (the graab_10 dev display link has been revised). The correction was to change normal apostrophe to (our current favorite) abnormal apostrophe in the tooltip file. There were also a couple of other similar changes. No change in the graab_10 text. The dev display has been revised.
Looks like still some adjustments are required in the display coding--
aMsa 〔I,283〕-- Aufrecht in Kuhn's Zeitschr. :Panzerbeiter, quaestiones umbricae.— Aufrechtʼs Article in Kuhnʼs Zeitschrift
aSman
〔5,135〕-- Aufrecht in Kuhn's Zeitschr. 
:Auhns.— Aufrechtʼs Article in Kuhnʼs Zeitschrift
Probably the display code requires the usage of  
for unambiguous resolving.
No change to temp_graab_10. Dev displays revised: https://sanskrit-lexicon.uni-koeln.de/work/gra-dev/graab_10/web/
Work done in graab/final directory.
Display tooltips for Kuhn's articles are present in file tooltip_1_ku_edit.txt.
Method used was to add the article info to the "abbreviation" where needed; no  
Also, added some line breaks to the long tooltips.
Used the 'padding-left' technique from the suggestion above. Seems to work well. What do you think?
Currently only applies to gra dictionary displays.
Seems to work well. What do you think?
As on screenshot? Good enough.
@gasyoun Here is screenshot of list display for 'tar' that uses the 'padding-left' technique:
Compare with a prior screenshot, where there is hidden text 'nur'.
Try the List display for 'tar' at https://sanskrit-lexicon.uni-koeln.de/work/gra-dev/graab_10/web/webtc1
If you play with the screen width, you should see no 'hidden' text at the right edge.
If you play with the screen width, you should see no 'hidden' text at the right edge.
Checked and found good with any screen width (in the "Advanced display" that I use!).
And, why not apply this globally across all CDSL works (if already "known" to be present elsewhere)?
@Andhrabharati I am beginning the process of moving temp_graab_10 from 'dev' stage to 'production' stage.
Please take a look at the proposed revision to gra-meta2.txt : gra-meta2_1.txt. Any alterations?
Looks good enough, except for a small correction--
See "graab/readme.txt" at ready to use install the new version
.
The Cologne displays now based on this new version.
Please note additional 'editionStmt' clause in graheader.xml.
A copy of the previous Cologne display is retained at url: https://sanskrit-lexicon.uni-koeln.de/scans/GRAScan/20230608/web/
Closing this issue and the related issues #31 #30 #29 #28 #27 #26 #25.
@funderburkjim
Homepage is linked to the previous version still!!
Sorry, got the new version on a hard refresh.
Closing this issue and the related issues #31 #30 #29 #28 #27 #26 #25.
I guess some more issues (#5, #10, #12, #15, #20, #21) could also be closed with this revision.
Even #24 may also be closed just like #30. [@gasyoun probably doesn't have a control on these, unless things get updated at the original source (from where he picks up the data and keeps in this repo).]
Continuation of #31.