Open funderburkjim opened 6 years ago
As part of the current work, I generated by program a list of all abbreviations X occurring as <ls>X</ls>
instances in the MW digitization; there were 305310 such instances; with 1001 distinct abbreviations.
Among these 1001 abbreviations, 638 matched to one of the abbreviations in the linkauthorities file.
However, 363 abbreviations did NOT match, and these non-matching abbreviations account for
999 of the <ls>X</ls>
instances.
The lsiast_all_table.txt file contains the data for those 1001 abbreviations. This file is in CSV format, with tab character separating the five fields.
The '?' character appears in the 363 cases where the abbreviation has not been matched yet.
There are numerous possible reasons for the unmatched abbreviations.
@SergeA Do you have time to identify the TYPE 1 cases? If so, you could provide the corrections in
some simple form. Then I can generate the corrections and rerun things so we can deal with the
smaller number of remaining cases.
The gist file should be all that is needed for these cases.
Vārtt.
markupCurrently, except for one case, the markup <ab>Vārtt.</ab>
is used (about 1000 instances).
I think these should
these be changed to ', and associated with
Vārttika of KĀTYĀYANA` ?
and associated withVārttika of KĀTYĀYANA ?
Most probably. In theory MW could also somewhere use this Vārtt. to mention some other Vārttikas, but I don´t think he did. So with Katyayana will be ok.
So @SergeA can you pick up the gist file for checking, please?
Yes, I'm in process.
list of corrections
Format:
copied line from lsiast_all_table.txt >>> CorrectAbbr. (my comments)
"Checked" mean I´ve searched the provided spelling in MONIER.ALL file, then searched the word entry through MW basic interface and then checked visually the scan, and usually found typos. In some cases found print error in MW. But mostly decided without checking, and it is unknown if it was typo or print error.
Well done. It is interesting to know how many ways there are to find print errors in the canonical MW.
This file now revised: lsiast_all_table.txt
This includes @SergeA 'list of corrections' and corrections made by me in a couple of additional passes.
The number of X which appear in <ls>X</ls>
but are unknown is down to 175 from the original 320, almost half have been resolved by corrections :)
I think I handled all the side comments/correctionsin SergeA's list, except one. He pointed out that
the expansion of Saddh. as sāhitya-darpaṇa
is surely wrong. It now appears in the list with '?' for
expansion. Also, here are the four instances:
headwords:
ft p. 226,1
jalaDaragarjitaGozasusvaranakzatrarAjasaMkusumitABijYa p. 415,1
vimati p. 979,3
susaMsTita p. 1238,2
This means that I've looked at the print for most of these, and that the print agrees with the spelling shown in lsiast gist file. I think most of the gross errors have been found.
The work so far in this round has definitely improved the quality of the matching of ls abbreviations. There may be a few other things we can do before we declare an end to the round.
There are probably some unmatched cases whose resolution would be as an alternate spelling of a
known abbreviation. For instance, 'Saddh.' might be an alternate abbreviation for
'SaddhP. SaddhP. 146 saddharma-puṇḍarīka Title'. However, it is not clear how to confirm such
a situation. One interim approach might be to make our best guess for such cases,
and then add some 'mea culpa' notation to such matches to indicate the provisional nature
of the match, such as
Saddh. Saddh. 4 saddharma-puṇḍarīka (Cologne guess) Title
.
There are a few like Kielhorn. Kielhorn. 1 ? ?
which are clearly European authors;
we should make appropriate entries for these, using
W. W. 8338 Horace H. Wilson Author
as a guide.`
Maybe @gasyoun could do these.
Also, there may be a few where @SergeA or @drdhaval2785 can make a high probability guess as to the name of
a work or author such as
Yavane7s3v
, Va1sisht2halP
, etc.
My suggestion to reach to these not so known works would be -
Find the five entries from ACC which have the least edit distance from unknown MW abbreviation.
This will help us zero down on these obscure works
Find the five entries from ACC which have the least edit distance from unknown MW abbreviation.
Possible, Jim?
There are a few like Kielhorn. Kielhorn. 1 ? ? which are clearly European authors
Where is the updated list?
I think Böhtlingk can help in many cases, as he provides more precise source references. But for me the main trouble here is the absence of good search engine, where I could input LS and get corresponding entries as list.
Let me tell you what I have in mind.
This index of works of course is also very helpful, and provides the field for research. But I did mean the the full text search in MW with possibility to list all the entries, where, e.g. this LS 'Agamap' is referred. My current method with manual search in txt file is very tiresome.
Oh. You can download notepad++ and do search and click on 'Find all in current file'. You shall have all the lines where it occurs with line number.
'Find all in current file'.
And even extract all of them to a new file with a single click.
Thanx for suggestion. Long time heard about Notepad++ but never had opportunity to try it. Yes this way is better. :)
Errors in current LS definitions
BR.
It is reference to Böhtlingk & Roth!
073 brāhmaṇa Literary category[BR., Br.]
>>> [Br.]
Checked several cases (agraha, artana, apipṛc, abhiśrī, alātṛṇa) -- PWG fits all.
Thus:
071Boehtlingk & Roth's Sanskrit-WoerterbuchTitle[BRD.]
>>> [BRD., BR.]
Kār.
It´s Kārikas on Pāṇini!
189 kāraṇḍa-vyūha Title[Kāraṇḍ., Kār.]
>>> [Kāraṇḍ.]
kārikas Title [Kār.]
Madanap.
Two lines are melted.
244 madanavinoda Title[Madanav., Madanap.]
>>> [Madanav.]
madanapārijāta Title [Madanap.]
KapSaṃh.
186 kapila-saṁhitā , kapila-purāṇa Title[KapSaṃh.]
>>> 186 kapila-saṁhitā from the skanda-purāṇa
in Original: Kap(ila)Saṃh(itā , from the SkandaP.)
Matsyas.
272 matsyasūkta 's śabdakalpadruma Title[Matsyas.]
>>> 272 matsyasūkta in śabdakalpadruma
This is the name of work quoted in ŚKD through which it is quoted further.
Nāḍīpr.
282 nāḍīprakāśa 's śabdakalpadruma Title[Nāḍīpr.]
>>> 282 nāḍīprakāśa in śabdakalpadruma
The same as previous.
Śākaṭ.
383 śākaṭāyana Author[Ṡāk.]
>>> [Śākaṭ.]
In Original: Śākaṭ(āyana).
& relink Ṡāk.
to
386 śakuntalā Title[Ṡak., Ṡāk.]
Also I'd like to say, I'm feeling uncomfortable with MWauth list beeing reedited and sometimes very different from the original text.
E.g. in Original:
Yājñ., Sch. (i.e. Mitākṣarā).
vs.
514 vijñāneśvara 's mitākṣarā , commentary on yājñavalkya-dharma-śāstra Title[unused]
(It is referred commonly with Comm. on Yājñ.
)
MWauth list beeing reedited and sometimes very different from the original text.
You want to say there is no such line of text as Yājñ., Sch.
in the HTML file? And because of that you spend a lot of time finding something similar?
[Note: for convenience, images of the two pages of Works and Authors from MW1899 have been put into the MWS repository as page1 and page2.
Refer to version 8 of lsiast_all_table.txt.
This is a revision of the previous lsiast_all_table.txt. This revision should include all corrections up to today, 2018-04-07.
This file is in CSV format, with tab character separating the six fields.
07:29 00007 Śākaṭ. Text: Śākaṭāyana
means that the 29th item in
column 7 (i.e., 2nd column of page 2) has printed abbreviation Śākaṭ.
while 07:29x 00079 Śāk. ....
indicates that we have inferred that Śāk.
is an alternate
abbreviationText: expansion
. This also is in IAST form; but this form was
derived from SLP spellings (for columns 1-10, anyway), so circumflexed vowels, etc. are absent.
For columns 1-10, this form is always present, and should agree with the printed text except for
details of IAST and parentheses Text: (None)
occurs in records '11:NN', and '?' in records '`12:NN'Norm: expansion
. This normalized form
was originally developed by Scharf et. al. (see the original form of literary source info).The mwauth.txt is now viewed as the primary document for the literary sources in MW. The lsiast_all_table.txt file mentioned above is derived from it, in addition to a list of all currently known instances of literary source abbreviations.
This is derived directly from mwauth.txt. A sqlite form of tooltip.txt is the source of the literary source tooltips used in the development displays.
As currently written, the abbreviation expansion uses the <expandNorm>
field of mwauth, which sometimes differs from the printed text form of the LIST OF WORKS AND AUTHORS in the <expandMW>
field of mwauth records.
The discrepancy regarding Yājñ.,Sch. mentioned above can now be better understood.
The <MWexpand>
field has Yājñavalkya,Sch.(i.e. Mitākṣarā)
and
the <Normexpand>
field has Vijñāneśvara's Mitākṣarā, commentary on Yājñavalkya-dharma-śāstra
.
The previous version of lsiast file only showed the Normexpand field (when it was present).
The tooltips also currently show the Normexpand field in preference to the MWexpand field.
This choice could be changed to always use the MWexpand field.
This abbreviation currently is not recognized. While there are about 2200 Yājñ. references, with
some having a related 'Sch.' These instances would need to be recoded in order to be
recognized as Yājñavalkya,Sch.(i.e. Mitākṣarā)
. 213 Yājñ. records have an'Sch.' in the same record,
but not all of these 'Sch.' instances refer to Yājñ.
<ls>Yājñ.</ls>, <ab>Sch.</ab>
-> <ls>Yājñ.,Sch.</ls>
is a simple change. But only 3 or so of these.
<ls>Yājñ. iii, 253 <ab>Sch.</ab></ls>
Recognizing this more common type as abbreviation
Yājñ.,Sch.
would present a new problem, due to the intervening iii, 253
.
Currently, the abbreviation for an <ls>X</ls>
instance is found (approximately) by looking at the characters of X up to and including the first period (assuming X starts with a capital letter).
But this technique wouldn't find Yājñ.,Sch.
in the example.
One solution would be to modify the digitization by providing for an optional attribute to the <ls>
element. In such cases where it is hard to parse the abbreviation from X, we could set the value of this attribute to the proper abbreviation. For instance,
<ls n="Yājñ.,Sch.">Yājñ. iii, 253 <ab>Sch.</ab></ls>
Should I do this?
Kār.
Kār. New entry: Hari-kārikās, the kārikās of Bhartṛ-hari, Title. I found this under from headword kArikA in MW . OK?
No. I don't know why MW fails to give definition for this widely used abbr.
Kār. = Kārikās, grammatical commentaries on Pāṇini in metrical form.
These Kārikās are versed commentaries, which are found in Patañjali´s Mahābhāṣyam. They supposedly belong to different authors.
E.g. the first occurrence of Kār.:
<H3>100{kriyAyoga}3{kriyA-yoga}¦ •m. ‹the…connection…with…an…action…or…verb› ‹¯APra1t.› ‹¯Pa1n2.…1-1…,…14› ‹¯Ka1r.›
This kārika is found in Mahābhāṣyam under the Pāṇinian rule 1.1.14, and looks like:
ईषदर्थे क्रियायोगे मर्यादाभिविधौ च यः।
एतमातं ङितं विद्याद्वाक्यस्मरणयोरङित् ॥
(and contains the word kriyāyoga from which it is referred)
Śāk.
Śāk. Now alternate abbreviation to Śākaṭ. śākaṭāyana OK? The long ā argues against this being an alternate abbreviation to Śak, śakuntalā
No. Śāk. = Śākuntala and has nothing to do with Śākaṭāyana. Śakuntala and Śākuntala are both correct spellings.
Proof.
<H3>100{acirabhAs}3{a-cira--bhAs}¦ •f. ‹lightning› ‹¯S3a1k.› MW001206
Check through PWG and there is a link to Śākuntala there.
acirabhās (acira + bhās)
1) adj. von kurzem Lichte.
— 2) f. Blitz Śāk. 166.
And the word is found in Böhtlingk´s edition of Śakuntala in verse 166 on page 99.
Next MW
<H1>100{ataTa}1{a-taTa}¦ •mfn. ‹having…no…beach…or…shore…,…precipitous› ‹¯S3a1k.›
PWG:
ataṭa (3. a + taṭa)
1) adj. uferlos, ohne einen sanften Abhang, ohne Leite, jähe: manorathānāmataṭaprapātāḥ Śāk. 137.
Found in verse 137 in the same book.
MW <H3>100{atiraMhas}3{a4ti--raMhas}¦ •mfn. ‹extremely…rapid› ‹¯S3a1k.› MW001833
PWG atiraṁhas (ati + raṁhas) adj. von ausserordentlicher Geschwindigkeit Śāk. 5.
Found in verse 5.
Etc.
Which expansion to use: MW or Norm ?
It is hard to choose only one. MW is sometimes too brief, while reedited (Norm) provides more info. But in other cases Norm makes the info less clear or corrupted.
E.g.
MW original: T(ārānātha Tarkavācaspati's Dictionary).
Norm: 460 tārānātha tarkavācaspati 's vācaspatyam , Sanskrit dictionaryTitle[T.]
Here MW gives only the name of the author of the dictionary. While Norm provides also the name of this dictionary. It is good.
And another example:
MW original: Hir(aṇyakeśin's) Gṛ(ihya-sūtra).
Norm 156 hiraṇyakeśin-gṛhya-sūtra Title[HirGṛ.]
In MW original the info is clear: the author´s name and the work´s name. In Norm without any reason the author´s name is combined with the name of the work, which makes the info less clear. And also there is a sandhi error: by Sanskrit rules the final "n" of Hiraṇyakeśin in compound should be dropped. So the info not only less clear but also provides an impossible spelling of the work´s name.
In the mentioned Yajnavalkya´s case:
MW original Yājñavalkya,Sch.(i.e. Mitākṣarā)
Norm: Vijñāneśvara's Mitākṣarā, commentary on Yājñavalkya-dharma-śāstra.
Norm adds the name of the commentary´s author Vijñāneśvara - this is good. And also provides the title of the work "dharmaśastra" - this is good. But makes a compound with author´s name Yājñavalkya - this is bad. And the work can be mentioned also by other names as Yajñavalkya-smṛti, or Yajñavalkya´s laws etc.
So the additions of Norm are both good and bad, fifty-fifty.
Yājñ. iii, 253 Sch.
This case is similar to Pāṇ. 1.1.4, Kār.
and other commentary cases with complicated references. A common approach is needed.
Kārikās, grammatical commentaries on Pāṇini in metrical form, found in Patañjali´s Mahābhāṣyam
based on your wording.
Śākuntala, the drama Abhijñāna-śakuntalā of Kālidāsa
, following AP entry.
Again, your proof was useful; I was able to find atiraMhas in verse 5 in
Monier-Williams edition of śakuntalāVijñāneśvara's Mitākṣarā, commentary on Yājñavalkya's Dharma-śāstra
The current data source (mwauth.txt) is divorced from that originally prepared by Scharf. et. al. We are free to introduce improvements as we see fit. For instance by rewording the 'Norm' expansions for increased clarity.
an optional attribute to the
element. In such cases where it is hard to parse the abbreviation from X, we could set the value of this attribute to the proper abbreviation
Makes sense.
For instance by rewording the 'Norm' expansions for increased clarity.
Guess we want Serge to dive deeper and it might take weeks.
In BRD. tooltip it's Roth''s now, with double '. All entries have same issue.
indra m. ( for etym.
Do we really want to have the spacing before and after ()
?
Caus. = Causal or Causative?
Is there a way to see all the compounds inside the head article listed?
Agamapr
,
namely AgamaprAmARya
. ghaṭa-kañcuki [p= 375] : n. an immoral rite practised by tāntrikas and śāktas (in which the bodices
of different women are placed in a receptacle and the men present at the ceremony are allowed to
take them out one by one and then cohabit with the woman to whom each bodice belongs),
Āgamapr. [L=69243]
Agamapr.
abbreviation, we need to find
an instance of ghaṭa-kañcuki
in the work AgamaprAmARya.kaJcuk
, leads to one match in
the second work, and no matches with the first work.
uSNISI kaJcukI nagno muktakezo gaNAvRtaH |
apavitrakarozuddhaH pralapanna japetkvacit ||
anAsanaH zayAno vA gacchan bhuJjAna eva vA |
rathyAyAmazivasthAne na japettimirAlaye ||
p. 57) mArjAraM kukkuTaM zvAnaM krauJcaM zUdraM kharaM kapim ||
dRSTvAcamya japetkarNaM spRSTvA snAnaM vidhIyate |
evaM japaniyamaH sarvatra jJeyaH | mAnase na doSakRditi ||
Agamapr
-- we should note it is a likely candidate.The DevīBhP. abbreviation occurs 13 times, and is in the printed index as Devī-bhāgavata-purāṇa
DevīP. abbreviation occurs 39 times, and is not in the printed index. Comparing a couple of the entries to PWG (e.g., kzema, ekaparRikA) we find PWG references to DevIpurāṇa via Sabdakalpadruma dictionary. From internet searches, it appears that DevIpurāṇa is another name for the work Devī-bhāgavata-purāṇa .
Since DevīBhP. is an abbreviation in the printed MW index, I am going to consider DevīP. as an alternate abbreviation for Devī-bhāgavata-purāṇa.
The Hir. abbreviation occurs 121 times, but is not present in the printed list of abbreviations of works.
HirGṛ occurs 19 times and is listed as Hiraṇyakeśin's Gṛhya-sūtra.
HirP. occurs 28 times and is listed as Hiraṇyakeśin's Pitṛmedha-sūtra.
A search of PWG/PW for several of the 121 headwords where MW mentions Hir. drew a blank.
I could not find a searchable Hiraṇyakeśin's Gṛhya-sūtra digitization, but did find an English translation.
There are a few of the 121 where a specific reference is mentioned. E.g.
<s>ni-jihvika</s> ¦ <lex>mfn.</lex> = (or <ab>w.r.</ab> for) <s>nir</s>-<s>j</s>°, tongueless,
<ls>Hir. i, 15, 5</ls>
Here the 5th item in the 15th section of the 1st praśna shows 'tongueless': Ref:
Similarly, apiSAcaDIta can be traced down in the first item under the 25th section of the first praśna:
<s>a-piSAcaDIta</s> ¦ <lex>mfn.</lex> (<ab>prob.</ab> right reading) not drunk or
<pb n="1314,3"/> sucked by <s1 slp1="piSAca">Piśāca</s1>s, <ls>Hir. i, 25, 1</ls>.
<info n="sup"/><info lex="m:f:n"/>
I find these examples convincing, and am introducing 'Hir.' as an alternate abbreviation for Hiraṇyakeśin's Gṛhya-sūtra.
Hiraṇy. occurs only twice as an abbreviation (hw BUrisaKa, mindAhuti). Neither has a specific reference location.
Based on the similarity to 'Hir.', and the spelling of Hiraṇyakeśin, I am marking Hiraṇy. as another alternate abbreviation for Hiraṇyakeśin's Gṛhya-sūtra.
PañcavBr. abbreviation occurs 252 times, and occurs in the listed works as abbreviation for PañcaviṃśaBrāhmaṇa.
PBr. abbreviation appears 142 times and Pañcav. appears 3 times.
There is evidence suggesting that PBr. abbreviation also refers to PañcaviṃśaBrāhmaṇa.
Pañcav, abbreviation occurs under headwords vidAya, viDAtrI, and vftti. However, in none of these does PWG or PW mention PañcaviṃśaBrāhmaṇa; although there are references variously to PAÑCARĀTRA and PAÑCATANTRA. Also, I could not find these words in the Gretl digitization of PañcaviṃśaBrāhmaṇa .
Conclusions:
I stated above that PañcavBr. is an abbreviation occurring in the MW list of works and authors, but it is not! At some time, an entry for this abbreviation (pointing to PañcaviṃśaBrāhmaṇa) was entered into the mwauth file. Its code 11:13 (being in a fictitious column '11') is a reminder that it is an add-on to the MW listing.
DevīP. as an alternate abbreviation for Devī-bhāgavata-purāṇa
Waited two weeks for your comeback, hurray.
Its code 11:13 (being in a fictitious column '11') is a reminder that it is an add-on to the MW listing.
Whan needs to know what to remember. And there is only 3 persons left who can read it, I guess - Jim, Peter and Thomas.
The Sāṃkhyas. abbreviation appears in entries for 10 headwords:
Based on ACC, Sāṃkhyas. abbreviation might be:
Kapila'sSāṃkhya-pravacana
is in MW list of works, with abbreviation KapS.There are versions of these at GRETIL - Samkhya. [In fact @drdhaval2785 is the contributor of several transcriptions!]. However, I have thus far had no luck in finding any of those 10 words within the GRETIL editions.
Assuming that KapS. is related, I also tried to find a few of the 124 headwords of MW which mention KapS., using the GRETIL version of sāṁkhyasūtra . Again, there was no luck with the search. I found this surprising -- making me wonder exactly in what edition of Kapila'sSāṃkhya-pravacana Monier found headwords marked as KapS. ?
Tantr. abbreviation occurs in 37 entries. By spelling, it is similar to abbreviation 'Tantras.' which occurs in 112 entries, and also appears in the list of works as Tantrasāra. A work by this name is available in digitally searchable form at GRETIL Tantrasara.
eine Form der Durgā (= tripuṭā?) Kālikā-P. im Śkdr. °nyāsa Tantras. in Verz. D. Oxf. H. 93,b,25.
which may help identify the version of TantrasAra this B. and MW used.All this illustrates how problematic the identification of particular words in current versions of works can be.
It might be that the 'Tantr.' abbreviation is an alternate of the 'Tantras.' abbreviation (there is also one instance of 'Tantra.' abbreviation, headword lokaDAtvISvarI). However, this is just speculation at this point based on the similarity of the spelling of the abbreviations, and does not provide any help regarding the context from which Monier or his colleagues drew the words.
This implements in the most simple way the idea that @drdhaval2785 suggested above to make use of the ACC headwords in the search for expansions of abbreviations of works which MW's list of works omits. The results are represented in four files in this gist.
(<ab>fr.</ab> <s>kzura/-pavi</s>), very sharp-edged, very sharp, <ls>BhP. vi, 5, 8</ls> (‘formed out of razors and thunderbolts’, <ls>Burnouf.</ls>).
I think this reference is to Burnouf's translation of BhāgavataPurāṇa. With all of these, it is highly desirable to complete the circle by finding the given headword instances within the work in question.
Cologne Addition
@gasyoun mentions above that the significance of '11' was obscure. As a way to distinguish literary source expansions that are listed in MW's printed two pages of works and authors from expansions that have been added both recently and in work of prior years, I've added the '[Cologne Addition]' phrase. This phrase is added within the base document for all the MW literary source abbreviations, which in the revised system is the mwauth.txt file; you can see the current dev version of mwauth. If you go down towards the bottom and see all the '11:xx' lines, you'll see that '[Cologne Addition]' phrase, and the phrase carries through to the tooltips of the displays.
The code appearing in the first column indicates the column number and sequence-within-column for the first 10 columns (as per the 2 pages of the scanned images of lists of works and authors). Since there is no 11th column in the print, codes with an 11 indicate material added to the printed version in the digital version. The Cologne addition phrase expresses this fact overtly.
All this illustrates how problematic the identification of particular words in current versions of works can be.
Sure, but should so much time be spent on it? 3 weeks you've been lost and only you can code, but not only you can search. As we've seen earlier the best searcher out there is @SergeA for unknown_abbrv.txt - how he does it I do not know, but that is why I value his help for the last 15 years. He might not help, sure, but if we ask, one day, he might get there. I do not think this is number one priority, still. Because 57 entries is a lot of work.
It would be quite reasonable to assert that these ACC matches are correct, and to add these expansions to the list. (See note on 'Cologne addition' in next comment.)
Agree.
Since there is no 11th column in the print, codes with an 11 indicate material added to the printed version in the digital version. The Cologne addition phrase expresses this fact overtly.
Sure, but without the note I would never figure it out.
should so much time be spent on it?
Not by me. I've taken it about as far as I can for now, and hope the filtering above will provide some help when someone else, such as @SergeA or @drdhaval2785 , takes up the problem.
What is my interest now, you may ask? I've done most of what I think needs to be done for now with the MW conversion. Since Peter has a strong interest in MW, he has agreed to discuss the proposed changes of the conversion sometime in May or June, when he will be back in the US. In the meantime, there are several items regarding documenting the changes to MW that I need to finish up. Then, I will install the MW conversion to Cologne. When this is done, the meta/iast conversion of all the dictionaries will be mostly complete at long last. What then? Well, several things come to mind, and I'll try to formulate possible next steps which we can then prioritize. One thing I'm curious about is whether @drdhaval2785 still has an active interest in this project.
I read every mail which lends in my imbox from github cologne with utmost curiosity. I have recently shifted to a new place Surat. So will resume after a month or so. And one more thing, I was waiting for these metaline conversions to get completed, so that differences in dictionaries are ironed out. Now that it is over, will resume this soon.
Happy to note that someone misses me somewhere.
Happy to note that someone misses me somewhere.
Joking? I missed you. Every. Single. Day. Not to say about @Shalu411 - it's my pain not to see he around anymore.
meta/iast conversion of all the dictionaries will be mostly complete at long last.
Guess it was the biggest trip up to now.
I'll try to formulate possible next steps which we can then prioritize.
Great.
The 'singleton' examples (formerly unknown abbreviations with exactly one matching ACC entry) have been added to the mwauth list. These changes are (also) documented by revision to abbrv2.txt.
Let's close this issue, and leave the remaining open questions in #222
In the current conversion of form of Cologne digitization of MW(1899) #216 and #218, the coding and expansion of the literary source abbreviations needs to be examined.
The system currently uses two data sources:
<ls>X. ---</ls>
instances in the digitization. It contains 651 entries. Each abbreviation X is matched to a code in mwauthorities.The reason that there are more items in the link file than in the mwauthorities file is that sometimes two different abbreviations have been judged to refer to the same mwauthorities record. For example abbreviations ĀṠvṠr. (4 times) and ĀṡvṠr. (744 times) are linked to āśvalāyana-śrauta-sūtra.
This gives an idea of the current system.