sanskrit-lexicon / csl-orig

Data for all dictionaries of Cologne. Now all corrections are made in this git-based workflow.
13 stars 9 forks source link

KRM suggestion #613

Open drdhaval2785 opened 3 years ago

drdhaval2785 commented 3 years ago

@Andhrabharati mentioned the following

%% Too bad; the FNs are not properly marked for the pages; they seem to have "offset" somewhere and it continued throught.
%% It is very difficult to locate the FNs with the present file status.
%% And the file can definitely be 'treated' as a punishment for human reading, with the splits everywhere beginning with '<'.
%% This is the biggest time comsuming work in my present excercise.
%% Needs a global correction in the text.
drdhaval2785 commented 3 years ago

I would request for an example with printed page and current digitization to show the problem more vividly and also the expected outcome, @Andhrabharati .

Andhrabharati commented 3 years ago

just one example enough to showcase the issue?

will give in a while.

Andhrabharati commented 3 years ago

pl. have a look at the result for जीव entry (<L>606<pc>0573<k1>जीव<k2>जीव)

image

what would one expect the place for Footnotes 4, A, 5 to be and linked to?

looking at the matter, I took it as p. 575. (guess anyone would do the same!)

but the book has it in p. 573 (belonging to some other earlier entry, not to this जीव. and not all FNs are offset by this same page difference! whatever it is, all those text blocks need to be properly positioned in the file and then display would use the data to show them correctly.

so are the FNs 4 & 5.

And the file is split at every <div and <sup so it is not easily readable. However the web display is showing them together in a single line. This is what I meant in saying as a punishment for human (free running) reading.

Andhrabharati commented 3 years ago

Incidentally there is another point I was to take up next.

The superscript numbers are marked as <sup>...</sup> in some texts and as ^... in some other texts.

I feel a uniform style should be adopted across all the works, for all kinds of markings.

Andhrabharati commented 3 years ago

And most probably, having such different styles in different works is one of the causes that is eating away @funderburkjim's time and energy.

Having a common style for the files will make the display process common, and for any process to that matter.

This benefits everyone (involved) to spend the same amount of time as now, for handling far more issues than at present.

funderburkjim commented 2 years ago

I feel a uniform style should be adopted [for superscripts]

Good point.

This is one of the fronts in the long-term struggle to impose non-invasive consistency in the details of the dictionary coding.

In regard to footnotes, this KRM text is probably the most difficult to fit into a form so that the corresponding display is useful.

And the current coding/displays need a lot of tlc.

Andhrabharati commented 2 years ago

In regard to footnotes, this KRM text is probably the most difficult to fit into a form so that the corresponding display is useful.

@drdhaval2785 shall we (I mean, I) relocate all those FNs to the resp. entries, as a set together?

drdhaval2785 commented 2 years ago

I do not think I have sufficient wherewithal to make necessary changes in the display code / xml generation code if some drastic change such as this is made. Let us drop it for now. Will come to this later on.

Andhrabharati commented 2 years ago

wherewithal? why is money (finance) [the literal meaning of the word used] needed in this?

if you meant know-how, having those pieces together does not need any addl. code changes, in my opinion.

I don't see any cross-navigation between main text and FNs anywhere at CDSL. (or did I miss it, if lying somewhere?)

Anyway, I am differing this work for now, as you said.

funderburkjim commented 2 years ago

shall we (I mean, I) relocate all those FNs to the resp. entries

I suggest for you to submit FN relocation for ONE entry in KRM. If your submission is found to be useable, then dealing with ALL the FN relocations could be done similarly by you when you find it convenient.

I expect the FN relocations will not involve any code change, but rather changes only to krm.txt.

Andhrabharati commented 2 years ago

@funderburkjim

The KRM text is designed to be mostly in tabular style, not as running text; and unfortunately this point got skipped in the digitisation.

See the style, as marked at the very first entry, and intended to be applied throughout. [Only the first header column is "implied", and other columns are rendered in this style everywhere else.]

KRM-1

KRM-2

I always try to get the heart of the content & style used in the resp. work, and didn't go any further fearing your non-willingness to accept so much of a change in the digitised text.

Relocating the FNs is fairly a simple task for me, but I don't like doing any half-cooking job!

Andhrabharati commented 2 years ago

@funderburkjim

Just recalled that I had this KRM work pending!

Opened the same and looked at it; noticed one important info being missed while "tagging" the text.

It appears that the . (dot) is kept outside the <s>...</s> tagging, taking it to be a punctuation mark.

In fact, it is the Devanagari abbr. mark and needs to be inside the s-tag. See the files below, which justify this. KRM_ab.txt KRM_ls.txt

I removed most of the tags, just for ease of reading, (and also did some corrections, as a trial)- krm_main.txt

Though I would be going further to bring-in the tabular (columnar) structure (as I mentioned earlier) into the data, thought I could re-position the FNs properly (as per the book pages), as a first step in the "raw (un-modified) file" from the Github repo.

Still interested in this piece of work from me? If not, I can directly go ahead with all the "corrections" in my way, as appropriate.

I shall wait for 2-3 days, to listen back from you.

Andhrabharati commented 2 years ago

I suggest for you to submit FN relocation for ONE entry in KRM. If your submission is found to be useable, then dealing with ALL the FN relocations could be done similarly by you when you find it convenient.

I expect the FN relocations will not involve any code change, but rather changes only to krm.txt.

As asked by you, here is the first sample entry- L-1 as in raw file.txt L-1 with relocated matter.txt [I have used the Devanagari file, as converted by @drdhaval2785, from the csl-devanagari repo.]

One can see that this has FNs 1 & 2 repeating, as the entry has spanned into multiple pages. It is appropriate to split the entry & FNs pagewise; else, there will be a doubt in 'matching' to correct FN in the repeats.

gasyoun commented 2 years ago

unfortunately this point got skipped in the digitisation.

Has @thomasincambodia ever had an idea why?

I would be going further to bring-in the tabular (columnar) structure (as I mentioned earlier) into the data, thought I could re-position the FNs properly (as per the book pages), as a first step in the "raw (un-modified) file" from the Github repo.

Now that would take long.

Still interested in this piece of work from me?

If my voice matters, yes. Because otherwise it remains still rather unusable.

Andhrabharati commented 2 years ago

@gasyoun

Thomas's original file seems to be without these FN & abbr. errors.

See the below L-5 entry, as an example, from the utf8 file by @funderburkjim-

<H>(5) {@“agi gatau”@} (bhvAdiH-{#I#}-146-saka. se. para.)
<P>aGgan-ntI, aGgiSyan-ntI-tI, aGgayan-ntI, aGgayiSyan-ntI-tI;
<>an~jigiSan-ntI ityAdirUpANi vinA, avaziSTAni aki (2.) dhAtuvat boddhavyAni |
<P>asya dhAtorauNAdike nipratyaye nalope (aGgati = jvAlArUpeNodrdhvaM
<>gacchati ityarthe) agniH || [Page0007+ 27]
<P>saMjn~AyAM ghaH = aGgam | prazastAni aGgAni yasyAH sA = aGganA |
<>‘aGgAt kalyANe’ (ga. sU, -5-2-100) iti pAmAditvAt naH pratyayaH | ‘vila-
<>GgadeNaM zabarAGganAjanapravaGgitaM maGgaladhenutaGgitam’ dhA-kA. 1-20.

Interestingly, the FNs were placed in-line with the main text in the utf8 file-

<H>(1) {@“aka kuTilAyAM gatau”@} (I-bhvAdiH-792 sakarmakaH-seT-parasmaipadI) ghaTAdiH mit |
<>‘iditastvaGkate tatra kuTilAyAM gatAvaket |’ (zlo 41) iti devaH |
<>Nic- san-
<NI>Nvul AkakaH--kikA,
<F>1. ‘mitAM hrasvaH’ (6-4-92)
<>iti Nau upadhAyA hrasvaH |</F> akakaH--kikA, acikiSa
<F>1A ‘ajAderdvitIyasya’ (6-1-2) iti dvitIya-
<>syaikAcaH dvitvam | ‘kuhozcuH’
<>(7-4-62) ityabhyAsasya cutvam |</F> kaH--SikA;
<>tRc (tRn) akitA-trI, akayitA-trI, acikiSitA-trI;
<>zatA akan-ntI, akayan-ntI, acikiSan-ntI;
<>akiSyan-ntI-tI, akayiSyan-tI-ntI, acikiSiSyan-ntI-tI;
<>zAnac akayamAnaH, akayiSyamANaH;
<>kvip ak-akau akaH;
<>niSThA akitam-
<F>A. ‘asAgyamAsraM sthagaye kathaM vA kagAmi
<>kiM vA haraye'kitAya’ dhAtu-
<>kAvye 2-8. zlokaH |</F> taH, akitaH, acikiSitaH-tam-tavAn;
<>anye pratyayAH akaH, akaH,
<F>2. ‘sanAzaMsabhikSa uH’ (3-2-168)
<>iti uH pratyayaH |</F> acikiSuH, acikayiSuH;
<>tavyaH akitavyam, akayitavyam, acikiSitavyam;
<>anIyar akanIyam, akanIyam, acikiSaNIyam;
<>Nyat {#or#} yat Akyam, akyam, acikiSyam;
<>khal ISadakaH-durakaH-svakaH;
<>yak akyamAnaH, akyamAnaH, acikiSyamANaH;
<>ghan~ AkaH, akaH, acikiSaH;
<>tumun akitum, akayitum, acikiSitum;
<>ktin
<F>2A ‘titutra--’ (7-2-9) itINNiSedhaH |</F> aktiH, akanA, acikiSA
<F>3. ‘a pratyayAt’ (3-3-102)
<>iti striyAmakAraH pratyayaH | TAp |</F>, acikayiSA; [Page0004+ 35]
<>lyuT akanam, akanam, acikiSaNam;
<>ktvA akitvA, akayitvA, acikiSitvA;
<>lyap samakya, samakayya
<F>1. ‘lyapi laghupUrvAt’ (6-4-56)
<>iti NerayAdezaH |</F>, samacikiSya;
<>¤ktvANamulau akitvA 2, AkamAkam,
<F>2. ‘ciNNamulordIrgho'nyatarasyAm’
<>(6-4-93) ityupadhAyA dIrghavikalpaH |</F> Akam 2, akam 2, akayitvA 2, acikiSam 2; acikiSitvA 2;

Probably, regenerating the files afresh from the utf8 would be a better choice; the FNs could be shown as tool-tips (as rendered in the utf8) with a little modification. [Having seen this utf8 version, I see no point spending time to re-position the FNs etc. in the current Cologne file.]

Will wait for @funderburkjim to have a final call on this.

Andhrabharati commented 2 years ago

Still, the final version as per book style needs to be done, as shown below-

L-1 final target.txt and KRM L-1 entry.pdf

Andhrabharati commented 2 years ago

Just tried converting the utf8 file to Devanagari.

Found that it has n~ (Velthuis/ITRANS ?) in place of J (HK).

Andhrabharati commented 2 years ago

There are two Tamil words at (72) इल {@“इल प्रेरणे”@} & (2001) हिक्क {@“हिक्क अव्यक्ते शब्दे”@}.

These are shown in English letters alone; probably they could be shown as ‘ஏலக்காய் (ElakkAy)’ [HW- इल] and ‘விக்கல் (vikkal)’ [HW- हिक्क].

Andhrabharati commented 2 years ago

There are 8 ^ marks left in the text, indicating FNs-

HW | mark | FN (7) | ^1 | <P>1. (at the end of the entry, not in-line) (37) | ^A | <P>^A. (at the end of the entry, not in-line) (43) | ^A | <>^A (at the end of the entry, not in-line) (621) | ^A | No supporting citation verse (in the file & the print) (1822) | ^A | No supporting citation verse (in the file & the print) (1823) | ^A. | No supporting citation verse (in the file & the print)

It may be noted that the English letters are tagged as {#...#} everywhere else (if not inside the FN tags), except at these ^A places (7 no.s).

Andhrabharati commented 2 years ago

In the above list,

So, it is just one item at (621) that has a 'missing link'.

funderburkjim commented 2 years ago

I agree that in its current Cologne form, this work is not very useful. Some years ago, when Peter Scharf was involved in this project, he expressed a similar opinion on KRM.

It might be appropriate to think in terms of a completely separate KRM1 edition. This would make use of the krm.txt digitization, but would not be bound by the various structurual constraints that are appropriate for the dictionaries, and the display logic would be quite specific to the tabular/footnote nature of KRM.

Probably this development can be along the lines that have occurred to you thus far. I would need to consider your work in detail before making any specific comments. At the moment (and probably for the next few weeks) I am involved with the PWK/PWG/BOESP projects.

So, why don't you pause your work on KRM for now. Then we will plan to take up this work in a few weeks.

Alternately, if @drdhaval2785 is interested and you prefer, you and he can work on a separate KRM rendition.

Andhrabharati commented 2 years ago

Glad to see your response @funderburkjim.

In fact, I stopped working on KRM, after seeing the utf8 text (though pointed out few corrections in it).

And I have noticed several printos and formatting changes required in the book, which can be put into the text. [Presently I am looking at INM, and marking necessary changes in the digitised file.]

BTW, see what V. Raghavan, a stalwart in all aspects of Skt. (who has played a major role over 50 years in India for the cause of Sanskrit), says about this work-

A compilation called <i>Dhāturūpaprakāśikā</i> was brought out long ago by Sri Srikantha Sastri in Telugu script from Mysore, where along with conjugational forms, ten Kṛts alone were worked out. In 1885, W.D. Whitney, as a supplement to his <i>Sanskrit Grammar</i>, brought out his <i>The Roots, Verb-forms and Primary Derivatives of the Sanskrit Language</i>. The former book, whose scope was limited, is no longer available and the latter, in Roman script, although now reprinted, is neither full nor solely concerned with the derivatives; also its script prevents its use among the mass of Sanskrit students in the country. It may therefore be said that not only is our present effort more complete than any attempted previously but also the material has been arranged and presented in such a manner that maximum usefulness is assured for the work.

I did not look closely at Westergaard's work, but sure this KRM work might be posited at par to it, if properly presented.

Andhrabharati commented 2 years ago

And I would like to mention here that I had come across few reviews on Whitney's Roots, which say that it has some (major) flaws and errors.

gasyoun commented 2 years ago

BOESP projects

Stands for what, my I ask?

I did not look closely at Westergaard's work

Would love to know your opinion. Because it remains the basis for most Europe printed dhatupathas.

I had come across few reviews on Whitney's Roots, which say that it has some (major) flaws and errors.

It was a long fight. Whitney won. Have you read https://mitpress.mit.edu/books/reader-sanskrit-grammarians ?

Andhrabharati commented 2 years ago

BOESP projects

Stands for what, my I ask?

Boe(thlingk's Indische) Sp(ruche)! [Working with Thomas]

gasyoun commented 2 years ago

Boe(thlingk's Indische) Sp(ruche)

Oh, it's IS or ISp usually ))

Andhrabharati commented 2 years ago

Boe(thlingk's Indische) Sp(ruche)

Oh, it's IS or ISp usually ))

It could have been IndSp, as B. himself had addressed it in his works.