sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

MW literary source incompletenesses #219

Open funderburkjim opened 6 years ago

funderburkjim commented 6 years ago

In the current conversion of form of Cologne digitization of MW(1899) #216 and #218, the coding and expansion of the literary source abbreviations needs to be examined.

The system currently uses two data sources:

The reason that there are more items in the link file than in the mwauthorities file is that sometimes two different abbreviations have been judged to refer to the same mwauthorities record. For example abbreviations ĀṠvṠr. (4 times) and ĀṡvṠr. (744 times) are linked to āśvalāyana-śrauta-sūtra.

This gives an idea of the current system.

funderburkjim commented 6 years ago

Incompleteness of linkauthorities

As part of the current work, I generated by program a list of all abbreviations X occurring as <ls>X</ls> instances in the MW digitization; there were 305310 such instances; with 1001 distinct abbreviations.

Among these 1001 abbreviations, 638 matched to one of the abbreviations in the linkauthorities file. However, 363 abbreviations did NOT match, and these non-matching abbreviations account for 999 of the <ls>X</ls> instances.

Supporting file

The lsiast_all_table.txt file contains the data for those 1001 abbreviations. This file is in CSV format, with tab character separating the five fields.

The '?' character appears in the 363 cases where the abbreviation has not been matched yet.

funderburkjim commented 6 years ago

Suggestions for improvements

There are numerous possible reasons for the unmatched abbreviations.

funderburkjim commented 6 years ago

@SergeA Do you have time to identify the TYPE 1 cases? If so, you could provide the corrections in some simple form. Then I can generate the corrections and rerun things so we can deal with the smaller number of remaining cases.
The gist file should be all that is needed for these cases.

funderburkjim commented 6 years ago

Vārtt. markup

Currently, except for one case, the markup <ab>Vārtt.</ab> is used (about 1000 instances).
I think these should these be changed to 'Vārtt., and associated withVārttika of KĀTYĀYANA` ?

SergeA commented 6 years ago

and associated withVārttika of KĀTYĀYANA ?

Most probably. In theory MW could also somewhere use this Vārtt. to mention some other Vārttikas, but I don´t think he did. So with Katyayana will be ok.

gasyoun commented 6 years ago

So @SergeA can you pick up the gist file for checking, please?

SergeA commented 6 years ago

Yes, I'm in process.

SergeA commented 6 years ago

list of corrections Format: copied line from lsiast_all_table.txt >>> CorrectAbbr. (my comments) "Checked" mean I´ve searched the provided spelling in MONIER.ALL file, then searched the word entry through MW basic interface and then checked visually the scan, and usually found typos. In some cases found print error in MW. But mostly decided without checking, and it is unknown if it was typo or print error.

gasyoun commented 6 years ago

Well done. It is interesting to know how many ways there are to find print errors in the canonical MW.

funderburkjim commented 6 years ago

Corrections incorporated

This file now revised: lsiast_all_table.txt

This includes @SergeA 'list of corrections' and corrections made by me in a couple of additional passes.

The number of X which appear in <ls>X</ls> but are unknown is down to 175 from the original 320, almost half have been resolved by corrections :)

Saddh.

I think I handled all the side comments/correctionsin SergeA's list, except one. He pointed out that the expansion of Saddh. as sāhitya-darpaṇa is surely wrong. It now appears in the list with '?' for expansion. Also, here are the four instances:

headwords: 
ft  p. 226,1
jalaDaragarjitaGozasusvaranakzatrarAjasaMkusumitABijYa  p. 415,1
vimati p. 979,3
susaMsTita p. 1238,2 

Most unknowns have been checked.

This means that I've looked at the print for most of these, and that the print agrees with the spelling shown in lsiast gist file. I think most of the gross errors have been found.

Ideas for further improvements

The work so far in this round has definitely improved the quality of the matching of ls abbreviations. There may be a few other things we can do before we declare an end to the round.

There are probably some unmatched cases whose resolution would be as an alternate spelling of a known abbreviation. For instance, 'Saddh.' might be an alternate abbreviation for 'SaddhP. SaddhP. 146 saddharma-puṇḍarīka Title'. However, it is not clear how to confirm such a situation. One interim approach might be to make our best guess for such cases, and then add some 'mea culpa' notation to such matches to indicate the provisional nature of the match, such as Saddh. Saddh. 4 saddharma-puṇḍarīka (Cologne guess) Title.

There are a few like Kielhorn. Kielhorn. 1 ? ? which are clearly European authors;
we should make appropriate entries for these, using W. W. 8338 Horace H. Wilson Author as a guide.` Maybe @gasyoun could do these.

Also, there may be a few where @SergeA or @drdhaval2785 can make a high probability guess as to the name of a work or author such as Yavane7s3v , Va1sisht2halP, etc.

drdhaval2785 commented 6 years ago

My suggestion to reach to these not so known works would be -

Find the five entries from ACC which have the least edit distance from unknown MW abbreviation.

This will help us zero down on these obscure works

gasyoun commented 6 years ago

Find the five entries from ACC which have the least edit distance from unknown MW abbreviation.

Possible, Jim?

There are a few like Kielhorn. Kielhorn. 1 ? ? which are clearly European authors

Where is the updated list?

SergeA commented 6 years ago

I think Böhtlingk can help in many cases, as he provides more precise source references. But for me the main trouble here is the absence of good search engine, where I could input LS and get corresponding entries as list.

drdhaval2785 commented 6 years ago

Let me tell you what I have in mind.

  1. Let us take one case 'Agamap' which is not decided.
  2. Open http://www.sanskrit-lexicon.uni-koeln.de/scans/ACCScan/2014/web/webtc2/index.php
  3. Enfer 'Agamap' and select prefix.
  4. It showed AgamaprAmARya. screenshot_20180327-184136
SergeA commented 6 years ago

This index of works of course is also very helpful, and provides the field for research. But I did mean the the full text search in MW with possibility to list all the entries, where, e.g. this LS 'Agamap' is referred. My current method with manual search in txt file is very tiresome.

drdhaval2785 commented 6 years ago

Oh. You can download notepad++ and do search and click on 'Find all in current file'. You shall have all the lines where it occurs with line number.

gasyoun commented 6 years ago

'Find all in current file'.

And even extract all of them to a new file with a single click.

SergeA commented 6 years ago

Thanx for suggestion. Long time heard about Notepad++ but never had opportunity to try it. Yes this way is better. :)

SergeA commented 6 years ago

Errors in current LS definitions

BR. It is reference to Böhtlingk & Roth! 073 brāhmaṇa Literary category[BR., Br.] >>> [Br.] Checked several cases (agraha, artana, apipṛc, abhiśrī, alātṛṇa) -- PWG fits all. Thus: 071Boehtlingk & Roth's Sanskrit-WoerterbuchTitle[BRD.] >>> [BRD., BR.]

Kār. It´s Kārikas on Pāṇini! 189 kāraṇḍa-vyūha Title[Kāraṇḍ., Kār.]>>> [Kāraṇḍ.]

Madanap. Two lines are melted. 244 madanavinoda Title[Madanav., Madanap.] >>> [Madanav.]

SergeA commented 6 years ago

KapSaṃh. 186 kapila-saṁhitā , kapila-purāṇa Title[KapSaṃh.] >>> 186 kapila-saṁhitā from the skanda-purāṇa in Original: Kap(ila)Saṃh(itā , from the SkandaP.)

Matsyas. 272 matsyasūkta 's śabdakalpadruma Title[Matsyas.]>>> 272 matsyasūkta in śabdakalpadruma This is the name of work quoted in ŚKD through which it is quoted further.

Nāḍīpr. 282 nāḍīprakāśa 's śabdakalpadruma Title[Nāḍīpr.] >>> 282 nāḍīprakāśa in śabdakalpadruma The same as previous.

SergeA commented 6 years ago

Śākaṭ. 383 śākaṭāyana Author[Ṡāk.]>>> [Śākaṭ.] In Original: Śākaṭ(āyana). & relink Ṡāk. to 386 śakuntalā Title[Ṡak., Ṡāk.]

Also I'd like to say, I'm feeling uncomfortable with MWauth list beeing reedited and sometimes very different from the original text. E.g. in Original: Yājñ., Sch. (i.e. Mitākṣarā). vs. 514 vijñāneśvara 's mitākṣarā , commentary on yājñavalkya-dharma-śāstra Title[unused] (It is referred commonly with Comm. on Yājñ.)

gasyoun commented 6 years ago

MWauth list beeing reedited and sometimes very different from the original text.

You want to say there is no such line of text as Yājñ., Sch. in the HTML file? And because of that you spend a lot of time finding something similar?

funderburkjim commented 6 years ago

Corrections made re comments above

funderburkjim commented 6 years ago

New form of working mwauth file

[Note: for convenience, images of the two pages of Works and Authors from MW1899 have been put into the MWS repository as page1 and page2.

Refer to version 8 of lsiast_all_table.txt.

This is a revision of the previous lsiast_all_table.txt. This revision should include all corrections up to today, 2018-04-07.

This file is in CSV format, with tab character separating the six fields.

funderburkjim commented 6 years ago

mwauth.txt

The mwauth.txt is now viewed as the primary document for the literary sources in MW. The lsiast_all_table.txt file mentioned above is derived from it, in addition to a list of all currently known instances of literary source abbreviations.

tooltip.txt

This is derived directly from mwauth.txt. A sqlite form of tooltip.txt is the source of the literary source tooltips used in the development displays.

image

As currently written, the abbreviation expansion uses the <expandNorm> field of mwauth, which sometimes differs from the printed text form of the LIST OF WORKS AND AUTHORS in the <expandMW> field of mwauth records.

funderburkjim commented 6 years ago

Which expansion to use: MW or Norm ?

The discrepancy regarding Yājñ.,Sch. mentioned above can now be better understood.

The <MWexpand> field has Yājñavalkya,Sch.(i.e. Mitākṣarā) and the <Normexpand> field has Vijñāneśvara's Mitākṣarā, commentary on Yājñavalkya-dharma-śāstra.

The previous version of lsiast file only showed the Normexpand field (when it was present).

The tooltips also currently show the Normexpand field in preference to the MWexpand field.

This choice could be changed to always use the MWexpand field.

funderburkjim commented 6 years ago

correction to some Yājñ.,Sch. is likely hard

This abbreviation currently is not recognized. While there are about 2200 Yājñ. references, with some having a related 'Sch.' These instances would need to be recoded in order to be recognized as Yājñavalkya,Sch.(i.e. Mitākṣarā). 213 Yājñ. records have an'Sch.' in the same record, but not all of these 'Sch.' instances refer to Yājñ.

Type 1

<ls>Yājñ.</ls>, <ab>Sch.</ab> -> <ls>Yājñ.,Sch.</ls> is a simple change. But only 3 or so of these.

Type 2

<ls>Yājñ. iii, 253 <ab>Sch.</ab></ls> Recognizing this more common type as abbreviation Yājñ.,Sch. would present a new problem, due to the intervening iii, 253.

Currently, the abbreviation for an <ls>X</ls> instance is found (approximately) by looking at the characters of X up to and including the first period (assuming X starts with a capital letter).

But this technique wouldn't find Yājñ.,Sch. in the example.

One solution would be to modify the digitization by providing for an optional attribute to the <ls> element. In such cases where it is hard to parse the abbreviation from X, we could set the value of this attribute to the proper abbreviation. For instance, <ls n="Yājñ.,Sch.">Yājñ. iii, 253 <ab>Sch.</ab></ls>

Should I do this?

SergeA commented 6 years ago

Kār.

Kār. New entry: Hari-kārikās, the kārikās of Bhartṛ-hari, Title. I found this under from headword kArikA in MW . OK?

No. I don't know why MW fails to give definition for this widely used abbr. Kār. = Kārikās, grammatical commentaries on Pāṇini in metrical form. These Kārikās are versed commentaries, which are found in Patañjali´s Mahābhāṣyam. They supposedly belong to different authors. E.g. the first occurrence of Kār.: <H3>100{kriyAyoga}3{kriyA-yoga}¦ •m. ‹the…connection…with…an…action…or…verb› ‹¯APra1t.› ‹¯Pa1n2.…1-1…,…14› ‹¯Ka1r.› This kārika is found in Mahābhāṣyam under the Pāṇinian rule 1.1.14, and looks like: ईषदर्थे क्रियायोगे मर्यादाभिविधौ च यः। एतमातं ङितं विद्याद्वाक्यस्मरणयोरङित् ॥ (and contains the word kriyāyoga from which it is referred)

Śāk.

Śāk. Now alternate abbreviation to Śākaṭ. śākaṭāyana OK?
    The long ā argues against this being an alternate abbreviation to Śak, śakuntalā

No. Śāk. = Śākuntala and has nothing to do with Śākaṭāyana. Śakuntala and Śākuntala are both correct spellings.

Proof. <H3>100{acirabhAs}3{a-cira--bhAs}¦ •f. ‹lightning› ‹¯S3a1k.› MW001206 Check through PWG and there is a link to Śākuntala there.

acirabhās (acira + bhās)
1) adj. von kurzem Lichte.
— 2) f. Blitz Śāk. 166. 

And the word is found in Böhtlingk´s edition of Śakuntala in verse 166 on page 99.

Next MW <H1>100{ataTa}1{a-taTa}¦ •mfn. ‹having…no…beach…or…shore…,…precipitous› ‹¯S3a1k.› PWG:

ataṭa (3. a + taṭa)
1) adj. uferlos, ohne einen sanften Abhang, ohne Leite, jähe: manorathānāmataṭaprapātāḥ Śāk. 137.  

Found in verse 137 in the same book.

MW <H3>100{atiraMhas}3{a4ti--raMhas}¦ •mfn. ‹extremely…rapid› ‹¯S3a1k.› MW001833 PWG atiraṁhas (ati + raṁhas) adj. von ausserordentlicher Geschwindigkeit Śāk. 5. Found in verse 5. Etc.

SergeA commented 6 years ago

Which expansion to use: MW or Norm ?

It is hard to choose only one. MW is sometimes too brief, while reedited (Norm) provides more info. But in other cases Norm makes the info less clear or corrupted. E.g. MW original: T(ārānātha Tarkavācaspati's Dictionary). Norm: 460 tārānātha tarkavācaspati 's vācaspatyam , Sanskrit dictionaryTitle[T.] Here MW gives only the name of the author of the dictionary. While Norm provides also the name of this dictionary. It is good.

And another example: MW original: Hir(aṇyakeśin's) Gṛ(ihya-sūtra). Norm 156 hiraṇyakeśin-gṛhya-sūtra Title[HirGṛ.] In MW original the info is clear: the author´s name and the work´s name. In Norm without any reason the author´s name is combined with the name of the work, which makes the info less clear. And also there is a sandhi error: by Sanskrit rules the final "n" of Hiraṇyakeśin in compound should be dropped. So the info not only less clear but also provides an impossible spelling of the work´s name.

In the mentioned Yajnavalkya´s case: MW original Yājñavalkya,Sch.(i.e. Mitākṣarā) Norm: Vijñāneśvara's Mitākṣarā, commentary on Yājñavalkya-dharma-śāstra. Norm adds the name of the commentary´s author Vijñāneśvara - this is good. And also provides the title of the work "dharmaśastra" - this is good. But makes a compound with author´s name Yājñavalkya - this is bad. And the work can be mentioned also by other names as Yajñavalkya-smṛti, or Yajñavalkya´s laws etc. So the additions of Norm are both good and bad, fifty-fifty.

SergeA commented 6 years ago
Yājñ. iii, 253 Sch.

This case is similar to Pāṇ. 1.1.4, Kār. and other commentary cases with complicated references. A common approach is needed.

funderburkjim commented 6 years ago

Revisions

Norm expansions we can edit as desired

The current data source (mwauth.txt) is divorced from that originally prepared by Scharf. et. al. We are free to introduce improvements as we see fit. For instance by rewording the 'Norm' expansions for increased clarity.

gasyoun commented 6 years ago

an optional attribute to the element. In such cases where it is hard to parse the abbreviation from X, we could set the value of this attribute to the proper abbreviation

Makes sense.

For instance by rewording the 'Norm' expansions for increased clarity.

Guess we want Serge to dive deeper and it might take weeks.

In BRD. tooltip it's Roth''s now, with double '. All entries have same issue.

indra m. ( for etym.

Do we really want to have the spacing before and after ()?

Caus. = Causal or Causative?

Is there a way to see all the compounds inside the head article listed?

funderburkjim commented 6 years ago

Further comment on the Āgamapr. abbreviation

  1. The ACC search shown above gives only one prefix matching (SLP1) Agamapr , namely AgamaprAmARya.
  2. The abbreviation Āgamapr. occurs in one entry only:
    ghaṭa-kañcuki [p= 375] : n. an immoral rite practised by tāntrikas and śāktas (in which the bodices 
    of different women are placed in a receptacle and the men present at the ceremony are allowed to
    take them out one by one and then cohabit with the woman to whom each bodice belongs), 
    Āgamapr.  [L=69243]
  3. To fully confirm that AgamaprAmARya is expansion of Agamapr. abbreviation, we need to find an instance of ghaṭa-kañcuki in the work AgamaprAmARya.
  4. There is a scanned edition of AgamaprAmARya at archive.org; it is about 96 pages in length.
  5. There is a searchable digitization of AgamaprAmARya via the Muktabodha Indological Research Institute.
    • There are two works with this name. image
    • Choosing each, and then viewing the HK form, and searching for kaJcuk, leads to one match in the second work, and no matches with the first work.
      uSNISI kaJcukI nagno muktakezo gaNAvRtaH |
      apavitrakarozuddhaH pralapanna japetkvacit ||
      anAsanaH zayAno vA gacchan bhuJjAna eva vA |
      rathyAyAmazivasthAne na japettimirAlaye ||
      p. 57) mArjAraM kukkuTaM zvAnaM krauJcaM zUdraM kharaM kapim ||
      dRSTvAcamya japetkarNaM spRSTvA snAnaM vidhIyate |
      evaM japaniyamaH sarvatra jJeyaH | mAnase na doSakRditi ||
  6. But I don't see 'gaTa' (HK), and am unable to determine if the subject is as described by MW under ghaṭa-kañcuki.
  7. My conclusion at the moment is:
    • AgamaprAmARya may be the expansion of Agamapr -- we should note it is a likely candidate.
    • However, until we can somehow relate the MW entry to a version of the text, we should keep the question open.
funderburkjim commented 6 years ago

DevīP. == DevībhP.

The DevīBhP. abbreviation occurs 13 times, and is in the printed index as Devī-bhāgavata-purāṇa

DevīP. abbreviation occurs 39 times, and is not in the printed index. Comparing a couple of the entries to PWG (e.g., kzema, ekaparRikA) we find PWG references to DevIpurāṇa via Sabdakalpadruma dictionary. From internet searches, it appears that DevIpurāṇa is another name for the work Devī-bhāgavata-purāṇa .

Since DevīBhP. is an abbreviation in the printed MW index, I am going to consider DevīP. as an alternate abbreviation for Devī-bhāgavata-purāṇa.

funderburkjim commented 6 years ago

Hir. == HirGṛ.

The Hir. abbreviation occurs 121 times, but is not present in the printed list of abbreviations of works.

HirGṛ occurs 19 times and is listed as Hiraṇyakeśin's Gṛhya-sūtra.

HirP. occurs 28 times and is listed as Hiraṇyakeśin's Pitṛmedha-sūtra.

A search of PWG/PW for several of the 121 headwords where MW mentions Hir. drew a blank.

I could not find a searchable Hiraṇyakeśin's Gṛhya-sūtra digitization, but did find an English translation.

There are a few of the 121 where a specific reference is mentioned. E.g.

<s>ni-jihvika</s> ¦ <lex>mfn.</lex> = (or <ab>w.r.</ab> for) <s>nir</s>-<s>j</s>°, tongueless, 
<ls>Hir. i, 15, 5</ls>

Here the 5th item in the 15th section of the 1st praśna shows 'tongueless': image Ref:

Similarly, apiSAcaDIta can be traced down in the first item under the 25th section of the first praśna:

<s>a-piSAcaDIta</s> ¦ <lex>mfn.</lex> (<ab>prob.</ab> right reading) not drunk or 
<pb n="1314,3"/> sucked by <s1 slp1="piSAca">Piśāca</s1>s, <ls>Hir. i, 25, 1</ls>.
<info n="sup"/><info lex="m:f:n"/>

image

I find these examples convincing, and am introducing 'Hir.' as an alternate abbreviation for Hiraṇyakeśin's Gṛhya-sūtra.

funderburkjim commented 6 years ago

Hiraṇy. == HirGṛ.

Hiraṇy. occurs only twice as an abbreviation (hw BUrisaKa, mindAhuti). Neither has a specific reference location.

Based on the similarity to 'Hir.', and the spelling of Hiraṇyakeśin, I am marking Hiraṇy. as another alternate abbreviation for Hiraṇyakeśin's Gṛhya-sūtra.

funderburkjim commented 6 years ago

PBr. , Pañcav. == PañcavBr.

PañcavBr. abbreviation occurs 252 times, and occurs in the listed works as abbreviation for PañcaviṃśaBrāhmaṇa.

PBr. abbreviation appears 142 times and Pañcav. appears 3 times.

There is evidence suggesting that PBr. abbreviation also refers to PañcaviṃśaBrāhmaṇa.

Pañcav, abbreviation occurs under headwords vidAya, viDAtrI, and vftti. However, in none of these does PWG or PW mention PañcaviṃśaBrāhmaṇa; although there are references variously to PAÑCARĀTRA and PAÑCATANTRA. Also, I could not find these words in the Gretl digitization of PañcaviṃśaBrāhmaṇa .

Conclusions:

funderburkjim commented 6 years ago

Correction: PañcavBr. is NOT listed in MW

I stated above that PañcavBr. is an abbreviation occurring in the MW list of works and authors, but it is not! At some time, an entry for this abbreviation (pointing to PañcaviṃśaBrāhmaṇa) was entered into the mwauth file. Its code 11:13 (being in a fictitious column '11') is a reminder that it is an add-on to the MW listing.

gasyoun commented 6 years ago

DevīP. as an alternate abbreviation for Devī-bhāgavata-purāṇa

Waited two weeks for your comeback, hurray.

Its code 11:13 (being in a fictitious column '11') is a reminder that it is an add-on to the MW listing.

Whan needs to know what to remember. And there is only 3 persons left who can read it, I guess - Jim, Peter and Thomas.

funderburkjim commented 6 years ago

Sāṃkhyas. not known

The Sāṃkhyas. abbreviation appears in entries for 10 headwords:

Based on ACC, Sāṃkhyas. abbreviation might be:

There are versions of these at GRETIL - Samkhya. [In fact @drdhaval2785 is the contributor of several transcriptions!]. However, I have thus far had no luck in finding any of those 10 words within the GRETIL editions.

Assuming that KapS. is related, I also tried to find a few of the 124 headwords of MW which mention KapS., using the GRETIL version of sāṁkhyasūtra . Again, there was no luck with the search. I found this surprising -- making me wonder exactly in what edition of Kapila'sSāṃkhya-pravacana Monier found headwords marked as KapS. ?

funderburkjim commented 6 years ago

Tantr. abbreviation unknown

Tantr. abbreviation occurs in 37 entries. By spelling, it is similar to abbreviation 'Tantras.' which occurs in 112 entries, and also appears in the list of works as Tantrasāra. A work by this name is available in digitally searchable form at GRETIL Tantrasara.

All this illustrates how problematic the identification of particular words in current versions of works can be.

It might be that the 'Tantr.' abbreviation is an alternate of the 'Tantras.' abbreviation (there is also one instance of 'Tantra.' abbreviation, headword lokaDAtvISvarI). However, this is just speculation at this point based on the similarity of the spelling of the abbreviations, and does not provide any help regarding the context from which Monier or his colleagues drew the words.

funderburkjim commented 6 years ago

Unknown abbreviations and ACC, part 1

This implements in the most simple way the idea that @drdhaval2785 suggested above to make use of the ACC headwords in the search for expansions of abbreviations of works which MW's list of works omits. The results are represented in four files in this gist.

Preliminary conclusions:

With all of these, it is highly desirable to complete the circle by finding the given headword instances within the work in question.

funderburkjim commented 6 years ago

Cologne Addition

@gasyoun mentions above that the significance of '11' was obscure. As a way to distinguish literary source expansions that are listed in MW's printed two pages of works and authors from expansions that have been added both recently and in work of prior years, I've added the '[Cologne Addition]' phrase. This phrase is added within the base document for all the MW literary source abbreviations, which in the revised system is the mwauth.txt file; you can see the current dev version of mwauth. If you go down towards the bottom and see all the '11:xx' lines, you'll see that '[Cologne Addition]' phrase, and the phrase carries through to the tooltips of the displays.

The code appearing in the first column indicates the column number and sequence-within-column for the first 10 columns (as per the 2 pages of the scanned images of lists of works and authors). Since there is no 11th column in the print, codes with an 11 indicate material added to the printed version in the digital version. The Cologne addition phrase expresses this fact overtly.

gasyoun commented 6 years ago

All this illustrates how problematic the identification of particular words in current versions of works can be.

Sure, but should so much time be spent on it? 3 weeks you've been lost and only you can code, but not only you can search. As we've seen earlier the best searcher out there is @SergeA for unknown_abbrv.txt - how he does it I do not know, but that is why I value his help for the last 15 years. He might not help, sure, but if we ask, one day, he might get there. I do not think this is number one priority, still. Because 57 entries is a lot of work.

It would be quite reasonable to assert that these ACC matches are correct, and to add these expansions to the list. (See note on 'Cologne addition' in next comment.)

Agree.

Since there is no 11th column in the print, codes with an 11 indicate material added to the printed version in the digital version. The Cologne addition phrase expresses this fact overtly.

Sure, but without the note I would never figure it out.

funderburkjim commented 6 years ago

should so much time be spent on it?

Not by me. I've taken it about as far as I can for now, and hope the filtering above will provide some help when someone else, such as @SergeA or @drdhaval2785 , takes up the problem.

What is my interest now, you may ask? I've done most of what I think needs to be done for now with the MW conversion. Since Peter has a strong interest in MW, he has agreed to discuss the proposed changes of the conversion sometime in May or June, when he will be back in the US. In the meantime, there are several items regarding documenting the changes to MW that I need to finish up. Then, I will install the MW conversion to Cologne. When this is done, the meta/iast conversion of all the dictionaries will be mostly complete at long last. What then? Well, several things come to mind, and I'll try to formulate possible next steps which we can then prioritize. One thing I'm curious about is whether @drdhaval2785 still has an active interest in this project.

drdhaval2785 commented 6 years ago

I read every mail which lends in my imbox from github cologne with utmost curiosity. I have recently shifted to a new place Surat. So will resume after a month or so. And one more thing, I was waiting for these metaline conversions to get completed, so that differences in dictionaries are ironed out. Now that it is over, will resume this soon.

Happy to note that someone misses me somewhere.

gasyoun commented 6 years ago

Happy to note that someone misses me somewhere.

Joking? I missed you. Every. Single. Day. Not to say about @Shalu411 - it's my pain not to see he around anymore.

meta/iast conversion of all the dictionaries will be mostly complete at long last.

Guess it was the biggest trip up to now.

I'll try to formulate possible next steps which we can then prioritize.

Great.

funderburkjim commented 6 years ago

revision to abbrv2

The 'singleton' examples (formerly unknown abbreviations with exactly one matching ACC entry) have been added to the mwauth list. These changes are (also) documented by revision to abbrv2.txt.

funderburkjim commented 6 years ago

Let's close this issue, and leave the remaining open questions in #222