sanskrit-lexicon / PWK

Sanskrit-Wörterbuch in kürzerer Fassung, 7 Bände Petersburg 1879-1889
3 stars 1 forks source link

Alternate headwords for pw #106

Closed funderburkjim closed 3 months ago

funderburkjim commented 5 months ago

We tackle the task of generating alternate headwords for pw dictionary.

Preliminary outline of the approach:

Note: no attempt to generate alternate headwords from upasargas of verb entries.

funderburkjim commented 3 months ago

Regarding '_'

You are definitely right that a round-trip of transcoding of X (slp1 -> hk - > slp1) does not result in X when X has certain properties (such as an 'ai' or 'au' hiatus, also 'bh', 'gh' , and maybe a few other cases).

A similar comment regarding IAST instead of hk.

My view has been that iast and hk should be viewed as faulty and/or incomplete transcoding schemes for Devanagari. cdsl could take upon itself the task of extending hk and iast to 'remedy' such problems. But, I have not thought the user reward for such a task is great enough to justify the effort, since such anomalies are rare.

While thinking about this, I noticed that the 'simple-search (input=simple)' display needs to be revised so that 'prauga' (MW) yields not only 'prOga' (slp1) but also 'prauga' (slp1).

Andhrabharati commented 3 months ago

My view has been that iast and hk should be viewed as faulty and/or incomplete transcoding schemes for Devanagari.

I've seen that slp1 itself also has the drawback of failing in the round-trip conversion, deva - slp1 - deva (or slp1 - deva - slp1) at such places!!

funderburkjim commented 3 months ago

temp_pw_9b.txt

temp_pw_9b.zip

This incorporates almost all of AB's latest batch of changes. See also change_8b_9.txt, change_9_9a.txt and change_9a_9b.txt for how I analyzed the many different kinds of changes proposed by AB. See diff_9b_ab_2.txt for the differences between temp_pw_9b.txt and AB's final file pw.integrated.AB.v1.for.CDSL.txt.

The changes are also integrated into the displays (locally):

image

@Andhrabharati When you sign off on temp_pw_9b.txt, I'll install it at Cologne.

funderburkjim commented 3 months ago

I've seen that slp1 itself also has the drawback of failing in the round-trip conversion, deva - slp1 - deva (or slp1 - deva - slp1) at such places!!

I'll believe it when I see it!

I doubt that the Ralph Bunker/Peter Scharf implementation of slp1-deva transcoding has an invertibility problem, but it may be that my implementation is imperfect.

When (if) you encounter such an instance, open a new issue and provide full details, so I can reproduce the problem, and hopefully correct any such imperfections.

Andhrabharati commented 3 months ago

@Andhrabharati When you sign off on temp_pw_9b.txt, I'll install it at Cologne.

Great to see that practically no differences exist between the two versions.

Here are the final changes--

  1. While at two entries (L-17562 and L-73947) the hiatus is removed in the header portion, it remained in the metaline.
  2. The final form concluded at L-124385 prompted me to look for other places having "(besser" and found 3 entries-- diff_9b-1.txt
  3. The SUrpa°RaKI at L-113882 prompted me to look for other places having "[a-z]°[a-z]" and found 8 lines, out of which 5 are typo or print errors

and the remaining 3 lines are the only 'rare' cases having the ° mark within the string (in the digital text; probably there might be few more, which would come out if and when a full proofing takes place to match the file data with the print - i.e. typo errors) [should we make these changes? if so, what's the best way to do so?]

Andhrabharati commented 3 months ago

This is one of the longest sessions that took place-- though at may a times going beyond the "subject matter" (due to my 'uncontrolled' way of corrections!)-- but bringing the text into a good form now.

I would like Jim to think of opening two more issues

I shall take responsibility for these two tasks (the first one does not need much time, and which only I can do [as of now]), but the 2nd one might take a week or so [which Jim could also try out as in GRA initially, and then I had jumped in to give finishing touches jointly].

Look forward to know what Jim decides on this.

Andhrabharati commented 3 months ago

Finally here is the concluding post from my side at this issue--

If I give a brief about the spl. markup introduced for the filled-up portions at the HW level, probably Jim might appreciate my idea and take up necessary action further (as I intended).

While in vast majority cases, the "padding" is done at the front of the compound word (as ⁅X⁆°Y), in just 91 cases it is done at the end (as X°⁅Y⁆).

I had presumed that we should somehow have the difference, and thus used the spl. markers '⁅ ⁆'; though the regular '[ ]' could've been used, as it has been used for other purposes in the print, I had thought of having a separate mark to avoid ambiguity.

Jim is requested to recall his opinion on the topic [as note 2 in L-12291.AB.revised_JF.txt, while I was working at MD last], wrt the status in MW.]

Now, what use did I have in mind for this marker in practice?

funderburkjim commented 3 months ago

temp_pw_9c.zip has the few changes mentioned by AB above.

change_9b_9c.txt has the changes.

This version is now installed at Cologne.

Additional revisions of repositories csl-corrections, csl-apidev, hwnorm1 (see commit links above).

The final version changes about 42000 lines out of 764942, or about 5% of lines. There are now about 12000 'alternate' headwords for pw. This work has taken about 6 weeks.

Now closing this issue. Will make a 'placeholder' issue for some additional TODOs.