WIL corrections to text Devanagari, 3-gram candidates

funderburkjim commented 8 years ago

This issue continues #321. It is devoted to the correction of Sanskrit text in Wilson Dictionary. The candidates for correction are generated based on the presence of 3-grams (sequences of 3 characters) which do not occur in headwords of any dictionary.

603 cases have been so identified.

These cases have been broken into smaller batches, identifed by a batch number of 301 to 320. Each batch has about 30 cases.

There is a User-Interface (UI) for marking corrections. The url for the UI depends on the batch number.

funderburkjim commented 8 years ago

Here are direct links to the batches:

batch 301, batch 302, batch 303, batch 304, batch 305,
batch 306, batch 307, batch 308, batch 309, batch 310,
batch 311, batch 312, batch 313, batch 314, batch 315,
batch 316, batch 317, batch 318, batch 319, batch 320,

funderburkjim commented 8 years ago

Progress Table for Batches

Batch	case1-case2	User	Date Begin	Date End	Installed
301	1-30	@funderburkjim	11/29/2016	11/29/2016	11/30/2016
302	31-60	@funderburkjim	12/01/2016	12/01/2016	12/01/2016
303	61-90	@funderburkjim	12/01/2016	12/02/2016	12/02/2016
304	91-120	@funderburkjim	12/02/2016	12/02/2016	12/02/2016
305	121-150	@funderburkjim	12/02/2016	12/02/2016	12/02/2016
306	151-180	@funderburkjim	12/02/2016	12/10/2016	12/10/2016
307	181-210	@funderburkjim	12/11/2016	12/11/2016	12/11/2016
308	211-240	@funderburkjim	12/11/2016	12/11/2016	12/11/2016
309	241-270	@funderburkjim	12/12/2016	12/12/2016	12/12/2016
310	271-300	@funderburkjim	12/12/2016	12/12/2016	12/12/2016
311	301-330	@funderburkjim	12/12/2016	12/12/2016	12/12/2016
312	331-360	@funderburkjim	12/12/2016	12/12/2016	12/12/2016
313	361-390	@SergeA	12/15/2016	12/15/2016	12/16/2016
314	391-420	@funderburkjim	12/16/2016	12/16/2016	12/16/2016
315	421-450	@funderburkjim	12/20/2016	12/20/2016	12/21/2016
316	451-480	@funderburkjim	12/20/2016	12/20/2016	12/21/2016
317	481-510	@funderburkjim	12/21/2016	12/21/2016	12/21/2016
318	511-540	@funderburkjim	12/21/2016	12/21/2016	12/21/2016
319	541-570	@funderburkjim	12/21/2016	12/21/2016	12/21/2016
320	571-610	@funderburkjim	12/21/2016	12/21/2016	12/21/2016

funderburkjim commented 8 years ago

Suggested work flow

Here is a suggested work flow for working on a batch.

Batch check-out
- edit the Progress Table in the comment above.
- enter your GitHub user name, and the date begun fields for the batch.
- Click 'Update comment' to save the edit.
- Now other users know you are working on this batch.
Make corrections using Batch UI
- click on the batch xxx link (see table two comments up). This opens the UI to work on the batch.
- In the UI for the batch, work on the cases until all are done. This can be done in multiple sessions.
Batch check-in
- edit the Progress Table
- enter the Date End field.
- Click 'Update comment' button to save the edit.
Notify @funderburkjim (me) that the batch is ready to install
- enter a comment in this issue such as 'Batch 302 ready for installation'
I'll work through the corrections for the batch, including examination of comments.
- I'll install the corrections at Cologne,
- I will edit the Progress Table below, inserting the installation date into the Installed field.
This will complete the work for the given batch.

funderburkjim commented 8 years ago

The UI for working on these corrections currently requires proficiency with the SLP1 transliteration.

I'll begin working on the batches.

Others are invited to join in the fun.

gasyoun commented 8 years ago

The UI for working on these corrections currently requires proficiency with the SLP1 transliteration.

I guess it's not so actually. What we have and what we show to the corrector should not be equal. If it's a single case, so be it. If it's a rule - I would fix it, before as @SergeA has proposed in the past. The feature is worth the price.

drdhaval2785 commented 8 years ago

I bank on word 'currently' of Jim. He plans to make it SLP1 independent for greater participation for sure.

gasyoun commented 8 years ago

I bank on word 'currently' of Jim.

Hope so.

funderburkjim commented 8 years ago

currently.

Right, the intent of that little adverb was to indicate that I agree that SLP1 dependency for correction UI is an obstacle to participation (notably for Sergey, but potentially for others).

Right now, having the correction UI based on a system different from the system of the digitization (e.g. HK or Devanagari or IAST for the UI, but SLP1 for the digitization) would tie me up like a pretzel; in other words, I don't see a good natural way to do it without being totally confused.

However, I anticipate that solutions to the technical obstacles will become clearer over time; and thus that the SLP1 dependency is not eternal but transient.

funderburkjim commented 7 years ago

vb and bv in Wilson.

Many of the cases occur because the digitization shows: 'vb' or 'bv' after 'r'. Although often the scanned image is too blurred to make a clear distinction, sometimes the scan clearly shows such forms. For instance:

In these cases, I am considering this rvb to be a print error, since by context the headword is 'rvv' so this also written form must be the other one, where 'b' is used in place of 'v', i.e. rbb.

Since so many of the cases are of this nature, I thought I should mention the way I am resolving them.

gasyoun commented 7 years ago

considering this rvb to be a print error, since by context the headword is 'rvv'

Makes sense.

Since so many of the cases are of this nature, I thought I should mention the way I am resolving them.

Yeah, that's what great about github. We have a background for the changelog.

funderburkjim commented 7 years ago

Revised version allows work in Devanagari or slp1

The update.php program now takes a parameter ?input=deva . If this parameter is present, then all Sanskrit words are displayed in Devanagari, and the New correction field should be spelled in Devanagari (unicode). A (disabled) button to the right of the main display region indicates whether the update program is in Devanagari mode or SLP1 mode.

If the parameter is either (a) absent, or (b) ?input=slp1, then the prior slp1 mode is in force. This is also true if the value of the input parameter is anything other than deva.

For convenience, here are links to the remaining batches using Devanagari mode.

batch 309 deva, batch 310 deva,
batch 311 deva, batch 312 deva, batch 313 deva, batch 314 deva, batch 315 deva,
batch 316 deva, batch 317 deva, batch 318 deva, batch 319 deva, batch 320 deva

drdhaval2785 commented 7 years ago

I hope the solution is generic enough, so we will have this luxury in other dictionaries too.

gasyoun commented 7 years ago

I hope the solution is generic enough, so we will have this luxury in other dictionaries too.

Agree.

funderburkjim commented 7 years ago

The technique should be generic for dictionaries which have Devanagari. Devanagari accents could be a problem for dictionaries with accents, as there is probably no standard way to input accents.

funderburkjim commented 7 years ago

Many of the correction candidates for Wilson involve grammatical names for affixes.

One detail of these where Wilson seems 'loose' is in putting a virAma at the end of an affix ending in a consonant. For instance Wilson shows ङीप instead of ङीप् . This is in contrast to VCP. For affix spellings I am sometimes referring to DictionaryOfSanskritGrammar_abhyankar.pdf, downloaded from archive.org.

For the few that are being brought to attention in this current list of 3-gram cases, I am making the change of adding the virAma, and considering the case to be a print error, although of course it seems a rather 'small' print error.

At some point, we should make a comprehensive study of the spellings of affix terms in Wilson, with an aim to bring the spelling of all such affixes into compliance with the generally accepted spellings.

funderburkjim commented 7 years ago

Question:

वैदिक (scan page 810) .वैदिक¦ mfn. (-कः-की or का-कं) Scriptural, derived from or conformable to the Ve4das. m. (-कः) A Brahman well versed in the Ve4das. (line # 158781) .E. वेद, and ठञ् or जिठ aff.

I can't find affix जिठ , and this is the only instance of its use in WIL.

In VCP, we see वैदिक [p= 4972] : वैदिक¦ पु० वेदं वेत्त्यधीते वा ठञ् । १ वेदज्ञे ब्राह्मणे वेदेषु विहितः ठक् । २ वेदोक्ते कर्मणि त्रि० । स्त्रियां ङीप् “वेदिकी तान्त्रिकी सन्ध्या यथानुक्रमयोगतः” तन्त्रम् । [L=42495]

So my guess is that the affix in WIL should be ङीप् , in explanation of the की f. ending.

So जिठ would be a print error in WIL

@drdhaval2785 What do you think?

Incidentally, वेदिकी in VCP looks like a typo, which should be वैदिकी.

funderburkjim commented 7 years ago

Here is a reference that lists uRAdi affixes, which occur often in Wilson's etymologies:

https://archive.org/stream/UnadiSutrasInSanskritGT

The list starts at page 23.

SergeA commented 7 years ago

all Sanskrit words are displayed in Devanagari

Great!

309 
Case 5: hw=ज्योत्स्ना DONE (ङिव → ङिप्)

should be ङीप्

In VCP, we see वैदिक [p= 4972] : वैदिक¦ पु० वेदं वेत्त्यधीते वा ठञ् । १ वेदज्ञे ब्राह्मणे वेदेषु विहितः ठक् । २ वेदोक्ते कर्मणि त्रि० । स्त्रियां ङीप् “वेदिकी तान्त्रिकी सन्ध्या यथानुक्रमयोगतः” तन्त्रम् । [L=42495]

So my guess is that the affix in WIL should be ङीप् , in explanation of the की f. ending.

But WIL does not point fem. form here. Apte gives here ठञ् ... ठक् वा And in VCP we rather have also the same:

वैदिक [p= 4972] : वैदिक¦ पु० वेदं वेत्त्यधीते वा ठञ् । १ वेदज्ञे ब्राह्मणे वेदेषु विहितः ठक् । २ वेदोक्ते कर्मणि त्रि० । स्त्रियां ङीप् “वेदिकी तान्त्रिकी सन्ध्या यथानुक्रमयोगतः” तन्त्रम् । [L=42495]

So जिठ would be a print error in WIL

And in 1819 edition it looks like ञिठ्. It is not a simple print error, it's something far grosser.

SergeA commented 7 years ago

Oooops! I'm sorry, somehow the button "comment" was disabled, and I did something wrong, I don't know exactly what...

gasyoun commented 7 years ago

accents, as there is probably no standard way to input accent

Let's add one standard!

affix spellings I am sometimes referring to DictionaryOfSanskritGrammar_abhyankar.pdf

Abhyankar's a valid source.

adding the virAma, and considering the case to be a print error, although, of course, it seems a rather 'small' print error.

It's like everybody knows what it is so why should I care to add that virama :+1:

funderburkjim commented 7 years ago

re 309.5 Agree with ङीप् -- that's what my note above also says, but I made error in showing short vowel on correction form.

@SergeA Thanks for catching!

If I understand what's going on, this ङीप् is to explain the alternate feminine form ending in long I.

funderburkjim commented 7 years ago

@SergeA Feel free to pitch in with these, now that you're done with #322.

SergeA commented 7 years ago

The text of Wilson 1832 looks horrible. It has so many print errors here and there, that it seems to me, it requires a total proofreading and reediting from the beginning to the end! But if we'll do it, the resulting correct text will differ very much from the printed book. It will be another edition, another book, not Wilson's 1832!

Anyhow while correcting the text of WIL32, it must be compared at every step with the edition of 1819, which one provides more comprehensive text. Please take a look at the examples in the doc.

https://docs.google.com/document/d/1hy3k4kIyQesTXluGwXBkumXDLkUfCE6T32V14YLYoNY/edit?usp=sharing

drdhaval2785 commented 7 years ago

Regarding It is 'YiWa' and not 'jiWa'. http://sanskritdocuments.org/learning_tools/sarvanisutrani/4.2.115.htm

'WaY' takes I in feminine. 'YiWa' takes A in feminine.

So WIL has given these two suffices to justify two feminine suffices. But vEdika is not a word in kASyAdi gaRa. bEdika is. WIL, because of b/v confusion has applied it to vEdika.

On 13 Dec 2016 03:45, "funderburkjim" notifications@github.com wrote:

Question:

वैदिक (scan page 810) .वैदिक¦ mfn. (-कः-की or का-कं) Scriptural, derived from or conformable to the Ve4das. m. (-कः) A Brahman well versed in the Ve4das. (line # 158781) .E. वेद, and ठञ् or जिठ aff.

I can't find affix जिठ , and this is the only instance of its use in WIL.

In VCP, we see वैदिक [p= 4972] : वैदिक¦ पु० वेदं वेत्त्यधीते वा ठञ् । १ वेदज्ञे ब्राह्मणे वेदेषु विहितः ठक् । २ वेदोक्ते कर्मणि त्रि० । स्त्रियां ङीप् “वेदिकी तान्त्रिकी सन्ध्या यथानुक्रमयोगतः” तन्त्रम् । [L=42495]

So my guess is that the affix in WIL should be ङीप् , in explanation of the की f. ending.

So जिठ would be a print error in WIL

@drdhaval2785 https://github.com/drdhaval2785 What do you think?

Incidentally, वेदिकी in VCP looks like a typo, which should be वैदिकी.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/323#issuecomment-266570352, or mute the thread https://github.com/notifications/unsubscribe-auth/AFfQ_LT8DlxoZNYkY5YSzzd3fTqyQEK4ks5rHceCgaJpZM4K_v_Z .

funderburkjim commented 7 years ago

I've followed the links in Google docs and now have the 1819 pdf.

In terms of print quality, the 1819 version seems superior to the 1832 version at Cologne.

In terms of content, the two seem very closely related, with the 1819 having more literary source references.

From the examples in the Google Docs, my impression is that the 1819 may have fewer print errors - @SergeA Agree?

A preliminary suggestion is that we use the 1819 edition as a resource while working through the remaining 3-gram correction groups.

Then, when that task is done, we should revisit the larger idea of how to deal further with Wilson's text.

Incidentally, my personal reason for interest in Wilson is the Etymology (E.) sections for nominals. MW is lacking in this.

SergeA commented 7 years ago

From the examples in the Google Docs, my impression is that the 1819 may have fewer print errors - @SergeA Agree? A preliminary suggestion is that we use the 1819 edition as a resource

This is exactly what I want to say. I didn't do any large comparision, but these few words show a big difference in quality. If other articles of Wil32 have similar errors, I'd rather say the 2nd edition was itself one big error. :(

http://reader.digitale-sammlungen.de/de/fs1/object/display/bsb10522369_00005.html Perhaps their scan of wil32 will be also useful. In the last example in ज्यौत्स्नी in that scan there is a clear ी at the end, while in the Cologne scan the upper tail of ी is lost.

Incidentally, my personal reason for interest in Wilson is the Etymology (E.) sections for nominals. MW is lacking in this.

Yes, such additional information can be useful for the people who are interested in traditional Indian grammar. (But so many print errors call this usefulness into question.) On the other hand for an ordinary student all this paninian stuff is a bit complicated and not necessary, and MW holds to European grammatical views.

gasyoun commented 7 years ago

Then, when that task is done, we should revisit the larger idea of how to deal further with Wilson's text.

Leave it as it is. WIL32 and MW72 are just not worth it. Too much labor, too little value. PWG and PWK makes more practical sense. That's my point of view. If by 2040 we had no idea what to do, then back to WIL32.

A preliminary suggestion is that we use the 1819 edition as a resource while working through the remaining 3-gram correction groups.

Indeed it's cleaner. The print is very clear.

Incidentally, my personal reason for interest in Wilson is the Etymology (E.) sections for nominals. MW is lacking in this.

Yes, that's the gold mine! But - MacDonnel has them, Apte does, @drdhaval2785 Vacaspatyam for all words as well?

SergeA commented 7 years ago

@funderburkjim When the case is doubled like in 313 5=7 is it OK to just correct both cases the same way?

funderburkjim commented 7 years ago

@SergeA - Yes I've noticed that a couple of times also. The way I've handled is to leave the duplicate as 'no change', and to mention 'duplicate of case xxx' in the comment.

SergeA commented 7 years ago

Then better I'll also leave the second "no change" with comm.

funderburkjim commented 7 years ago

Batch 313, case 3. Removed the anusvAra -- it duplicates the dental nasal. Print error.

funderburkjim commented 7 years ago

Batch 313, case 24. द्यूतपैर्णिमी -> द्यूतपौर्णिमी a typo

Note on Devanagari. The first three letters in HK are dyU To my eye, this is not rendered correctly, at least in the font I am using. When enlarged, we case see a virAma on the 'd', partially obscured by the 'U'. Incidentally, siddhanta font does a good rendering here:

SergeA commented 7 years ago

Batch 313, case 3. Removed the anusvAra -- it duplicates the dental nasal. Print error. Batch 313, case 24. द्यूतपैर्णिमी -> द्यूतपौर्णिमी a typo

Perhaps it's because of beeing troubled by those corrupted anubandhas... But my proofing of WIL gives too bad results. Two obvious cases missed. Shame on me. :(

Note on Devanagari. The first three letters in HK are dyU To my eye, this is not rendered correctly, at least in the font I am using. When enlarged, we case see a virAma on the 'd', partially obscured by the 'U'.

I don't see any problem with the rendering in my comp.

funderburkjim commented 7 years ago

There were several (5) cases where the printed Wilson used the spelling नीञ् to refer to the root 'to lead'. Since he has this root spelled as णी and therein gives the anubandha spelling णीञ् , I considered the spelling नीञ् to be a print error.

Incidentally, a check of WIL1819 for अभिनय also shows नीञ् ,

funderburkjim commented 7 years ago

f. of रौप्य (silver) - रौप्यी रौप्या वा ?

Wilson shows 'I'. Word not found in WIL1819. MW, AP90 show no 'I', so 'A' is implied.

Should 'I' in WIlson be considered a print error?

Current handling - no change in WIL - leave it 'I'.

funderburkjim commented 7 years ago

Batch 15, case 17. प्रमापण .

In derivation, WIL32 shows root मीञ् (to hurt or kill), while WIL1819 shows मिञ् .

In this case WIL32 seems right (cf . roots मी मि च . )

funderburkjim commented 7 years ago

All the batches are examined and installed.

Done with Wilson for now. Have noted that There is still AS coding in Wilson. Desireable to remove it sometime (in favor of Unicode diacritics), but a tedious task.

SergeA commented 7 years ago

315 Case 19: hw=शक्रमूर्द्धन् DONE (मूर्ट्ध्वन् → मूर्द्ध्वन्)

here the headword should be changed to शक्रमूर्द्ध्वन् (rddhv)

Also in the word

मूर्द्धन् [L=30693] [p= 668] .मूर्द्धन्¦ m. (-र्द्धा) The head. E. मुह to be foolish, Un'ádi aff. कनिन्, form irr., or मुर्व्व to bind, the same aff., धङ् augment.

it should be changed to मूर्द्ध्वन् मूर्द्ध्वन् -र्द्ध्वा and ध्वङ्.

317 Case 3: hw=वर्व्वर DONE (वर्ब्वर → बर्व्वर)

Logically in the onomatopoetic reduplication there can be only two variants: bar+bar> barbbara or var+var>varvvara. Because of bad quality of print and scan it is hard to determine the exact letters of the words here. But when Whilson gives here four (!) different spellings it is as if he declared he does not want to distinguish b & v. And moreover, he likes to blend them!

317 Case 6: hw=शुश्रुवस् DONE (वान्-श्रुयी-वः is spelled correctly)

श्रुयी >> श्रुषी But I suppose it is wrong, and the correct form of this word is शुश्रुवस् (शुश्रुवान् शुश्रुवुषी शुश्रुवत्), see Kale §124.

320 Case 15: hw=व्यवहारविषय DONE (सम्बिट्व्यतिक्रमः → सम्विट्व्यतिक्रमः)

सम्विद्व्यतिक्रमः (dvy) Or better संविद्॰

SergeA commented 7 years ago

Also in the word
मूर्द्धन् [L=30693] [p= 668] .मूर्द्धन्¦ m. (-र्द्धा) The head.
E. मुह to be foolish, Un'ádi aff. कनिन्, form irr., or मुर्व्व to bind, the same aff., धङ् augment.
it should be changed to मूर्द्ध्वन् मूर्द्ध्वन् -र्द्ध्वा and ध्वङ्.

No, sorry, this is Ok. I did not find the proper line.

funderburkjim commented 7 years ago

315 case 19. hw change to शक्रमूर्द्ध्वन् . Done
- But in headword मूर्द्धन् did NOT make the changes suggested. Reason, WIL already has a separate मूर्द्ध्वन् headword. (I guess you realized that in second comment.)
317 Case 3: hw=वर्व्वर DONE (वर्ब्वर → बर्व्वर)
My reason for the initial correction बर्व्वर was that this shows clearly in WIL1819 under वर्व्वर . However, I now think it is better to change to बर्ब्बर since WIL has this as a separate headword, so it should be one of the variants under this hw वर्व्वर . However, the situation with v/b in WIlson is probably hopelessly tangled.
317 Case 6: hw=शुश्रुवस् . Corrected f. to -वुषी , per Kale. Consider this to be print error.
320 Case 15: hw=व्यवहारविषय . सम्बिट्व्यतिक्रमः now corrected to सम्विद्व्यतिक्रमः , per suggestion. Confirm MW . Leaving 'm' rather than anusvAra, since Wilson prints that way.
- Is there a reliable way to distinguish, in Wilson Devanagari, between ट् (HK=T) and द् (HK=d) ?

@SergeA Thanks for proofreading!

These revisions now installed.

SergeA commented 7 years ago

317 Case 6: hw=शुश्रुवस् . Corrected f. to -वुषी , per Kale. Consider this to be print error.

But look at the neuter termination, it is also given wrong. 1819 शुश्रुवत् (wrong) -वान् (correct) -वती (wrong) -वत् (correct) 1832 शुश्रवस् (wrong) -वान् (correct) -श्रुषी (wrong) -वः (wrong) where supposed form is शुश्रुवस् -वान् -वुषी -वत् Looks like he did not know the correct forms of this word. Firstly he treated it as in - vat, and in 2nd ed., perhaps, by analogy with श्रेयस् (-यान् -यसी -यः) , both times wrong.

I don't know if we have the right to correct such cases as print errors. The sense of digitalization is to provide the trustworthy representation of the printed book. Including all the peculiarities of that book. Yes, we see in Wilson's dic many errors. But they are part of that printed book. They represent the author's style, his way of thinking, his misunderstandings etc. When we correct them, we change the book, we change author's text by ours. Then instead of Wilsons's dic (which is an historical document) the reader will get something else. Corrected by somebody it will be then somebody's text. When such cases are few, too little to mention them, it does not make big problem. But in Wilson's dic there are too many of them.

It would be great in such situation to have two representations of the dic. First should be verbatim digitalization with all the misprints. And the second one corrected, with highlighted corrections. Perhaps showing the corrected text with the option to see the supposedly erroneous original. So the reader could get the original text, compare, judge etc.

Of course it is just my opinion about how to do it the best way. On the other hand I don't think this dic is worth all those troubles which it gives us.

Is there a reliable way to distinguish, in Wilson Devanagari, between ट् (HK=T) and द् (HK=d) ?

In this case I see it clearly printed as द + ्. And it is impossible to get ṭ-v in the place of the junction, because by sandhi it would give ḍv, not ṭv. But in othet cases in the middle of the word this sandhi does not work, as in dṛṣṭvā or in khaṭvā ' a bedstead , couch ' we meet ṭv combination. (BTW, khaṭvā in digitalisation is wrongly spelled as khadvā.) Between ṣ_v it is always should be ṣṭv, not ṣdv. Other cases of ṭv are not very numerous, and perhaps we can recheck them all as candidates for change to dv.

funderburkjim commented 7 years ago

I wondered about the neuter also, since you confirm, I'll change that also to vat.

Will also change:

(dictionary in SLP1)
.{#KadvA#}¦ f. ({#-dvA#})
to
.{#KawvA#}¦ f. ({#-wvA#})
; Similarly
; L = 12618, hw= KawwANga -> KawvAnga
53421 old .{#KawwANga#}¦ m. ({#-NgaM#}) A king of the solar line. n. ({#-NgaM#})
53421 new .{#KawvANga#}¦ m. ({#-NgaM#}) A king of the solar line. n. ({#-NgaM#})
53426 old .E. {#KadvA#} a cot, and {#aNka#} body or form.
53426 new .E. {#KawvA#} a cot, and {#aNka#} body or form.
; L= 12619, hw = KadvANgaBft -> KawvANgaBft
53428 old .{#KadvANgaBft#}¦ m. ({#-Bft#}) A name of S'IVA. {#KadvANga#} one of his
53428 new .{#KawvANgaBft#}¦ m. ({#-Bft#}) A name of S'IVA. {#KawvANga#} one of his
; L=12620, hw=KadvANgin -> KawvANgin
53431 old .{#KadvANgin#}¦ m. ({#-NgI#}) A name of SIVA.
53431 new .{#KawvANgin#}¦ m. ({#-NgI#}) A name of SIVA.
53432 old .E. {#KadvANga,#} and {#ini#} aff.
53432 new .E. {#KawvANga,#} and {#ini#} aff.
; L=12621, hw = KadvArUQa -> KawvArUQa
53434 old .{#KadvArUQa#}¦ mfn. ({#-QaH-QA-QaM#})
53434 new .{#KawvArUQa#}¦ mfn. ({#-QaH-QA-QaM#})
53439 old .E. {#KadvA#} a bed, and {#ArUQa#} mounted.
53439 new .E. {#KawvA#} a bed, and {#ArUQa#} mounted.
; L=12621, hw = KadvikA -> KawvikA
53441 old .{#KadvikA#}¦ f. ({#-kA#}) A small bedstead.
53441 new .{#KawvikA#}¦ f. ({#-kA#}) A small bedstead.
53442 old .E. {#KadvA#} a bedstead, and {#kan#} affix, fem. form.
53442 new .E. {#KawvA#} a bedstead, and {#kan#} affix, fem. form.

funderburkjim commented 7 years ago

Regarding correcting print errors, there is a file of Wilson print changes. Other dictionaries have similar files.

These files could form the basis for a version of the dictionary with the original errors in living color, along with what we think are the corrections. What is required is someone to devise data structures and programs to coordinate the two versions. It would be better if there were a small team of people with both adequate programming skills as well as interest in accomplishing such tasks. But currently there is only me, and I have to pick the battles to enter. I don't want to enter this battle now.

I think @zaaf2 made a suggestion along these lines some time ago, but I cannot find the issue.

One intermediate solution might be to simply add a flag to those records for which one or more print changes have been made; then the displays could display this flag by a small message and a link to the corresponding print change file.

See next issue, where some similar questions (regarding preserving the author's style) arise.

I think we need to put Wilson aside for a while. My inclination is to switch to finding text Sanskrit spelling errors in the two Apte Sanskrit-English dictionaries.

gasyoun commented 7 years ago

And the second one corrected, with highlighted corrections. Perhaps showing the corrected text with the option to see the supposedly erroneous original.

Makes sense. But requires coding power we have not now.

On the other hand I don't think this dic is worth all those troubles which it gives us.

I agree. At least not to start with it. Wilson, both editions and first edition of Monier are not worth to start with them. They are historical documents, not actually reference works after 200 years.

What is required is someone to devise data structures and programs to coordinate the two versions. It would be better if there were a small team of people with both adequate programming skills as well as interest in accomplishing such tasks. But currently there is only me, and I have to pick the battles to enter. I don't want to enter this battle now.

Fully agree. Doing my best to get help for web coding and representation from my pupils with not much luck yet.

One intermediate solution might be to simply add a flag to those records for which one or more print changes have been made

If that's doable without stopping the rest, I would go for it.

I think we need to put Wilson aside for a while. My inclination is to switch to finding text Sanskrit spelling errors in the two Apte Sanskrit-English dictionaries.

Youngest first. And always so. The latest should be clean first.

funderburkjim commented 7 years ago

@gasyoun Regarding cleaning AP (and AP90). Do you have access to a copy of the DSAL Apte? This might be an excellent resource for corrections to our digitizations.

gasyoun commented 7 years ago

Do you have access to a copy of the DSAL Apte?

Sure, I have scraped it several times. It has several errors, can't remember where documented (some pages missing), otherwise it's used as basis for https://sourceforge.net/projects/sandic/ Let me give you the source files of the tool, that contain DSAL's Apte in a slightly converted mode. It was years ago and as I do not document much (always in hurry :+1: ), but everything should be there. I've submitted the errors found to DSAL. First I was in good contact with them, after they got lost.

Source: https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/SanDicDB.zip

This might be an excellent resource for corrections to our digitizations.

Hope at least as good as the Tirupati comparison.

sanskrit-lexicon / CORRECTIONS