sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899
7 stars 5 forks source link

MW supplement fresh look, part 3 #96

Closed funderburkjim closed 1 year ago

funderburkjim commented 3 years ago

This issue continues #83.

The changes begin!


Suggest you

Once we're sure the git process works,
suggest you commit and push often, so we can comfortably follow your changes.

Andhrabharati commented 3 years ago

Now I understand why smaller corrections are preferred here- for ease of tracking by the concerned!

Let's see how it goes.

gasyoun commented 3 years ago

Let's see how it goes.

Gave additional rights.


funderburkjim commented 3 years ago

My procedure was to add @Andhrabharati to the Corrections team, and then in MWS to give write privileges to the Corrections team.

gasyoun commented 3 years ago

My procedure was to add @Andhrabharati to the Corrections team, and then in MWS to give write privileges to the Corrections team.

Mine was more radical. Hope it works.

funderburkjim commented 3 years ago

You could undo yours, and then we can see if mine works.

gasyoun commented 3 years ago

You could undo yours, and then we can see if mine works.

Undone mine. Made Andhrabharati a member.

Andhrabharati commented 3 years ago

Let's see how it goes.

Gave additional rights.


are there any 'secret-teams' here? just curious!!

funderburkjim commented 3 years ago

You can see this at This shows 'secret' beside 'owners' and 'researchers' -- I don't know significance of 'secret'.

You can see the corrections team at

I've used the corrections team recently for working with @AnnaRybakovaT and @sanskritisampada and now @Andhrabharati .

This corrections team seems to be useful so far, hope you also find it useful.

gasyoun commented 3 years ago

I don't know significance of 'secret'.


Andhrabharati commented 3 years ago

I started looking at the data all over again today.

As a first step, started making the file in Excel format facilitating tracing the issues very easily and effectively.. [I shall be posting them one by one, if it's of any interest.]

§1. While trying to rearrange the records to tally the no. of lines with <L> count (287443 of them now), seen that (a) 14 blank lines are in the file. (b) about 2500 Lines starting with a space etc. at 200+ entries (listed below). Other than splitting at a semicolon, no other logic is seen in these; these need to be looked at to combine or split appropriately. Some of these are "candidates" for compound words, some are just explanatory material to be within ( ). Some of these are dhātu or verbal type, and could have been split with <div> tag as done at many others of that category. For ease of locating these entries, I have clubbed the lines with $ marking.



§2. Three records have two '¦' in it.

¦ of <s1 slp1="sItA">Sītā</s1>, <ls>RāmatUp.</ls> (¦ <ab>N.</ab> of <ab>wk.</ab>)<info phwchild="116611.1"/><info lex="inh"/>

;; except the space after the opening brace ( N. of wk.), no other impact is seen in this.

¦ (sc. ¦ = <s>piśāca-bh°</s>, <ls>L.</ls><info lex="inh"/>

;; bhāṣā is missed after sc. and so is the closing brace. [(sc. bhāṣā) intending to say piśācikā bhāṣā = piśāca bhāṣā] ;; incidentally this "sc." (total count = 97) is not expanded; could it be the same as "scil."? (total count =560+)

¦ (with ¦ with `<s>śakaṭa</s>`, <ls>MW.</ls><info lex="inh"/>

;; two 'with's are shown here.

Andhrabharati commented 3 years ago

Pushed the Excel file to the MWS\mwtranscode folder.

Though @funderburkjim may not be directly able to use this Excel file, @gasyoun and @drdhaval2785 should be in a position to do so.

Andhrabharati commented 3 years ago

Though I am supposed to do the supplement portion for now, started with other issues in the data, taking that now I am part of the TEAM.

[The actual point is that I am currently on the proofing work; so it would take sometime to start posting changes in the Supplement portion]

Andhrabharati commented 3 years ago

§3. texts within <s>...</s>

<s>-     18508
-<s>    386

</s>-   6340
-</s>   2

to keep all these '-' within <s>...</s>
<s>°    12171
°<s>    94

°</s>   9057
</s>°   130

to keep all these '°' within <s>...</s>
</s>.   9619
.</s>   1

to remove the dot in this single case (and which is not in the print!)
Andhrabharati commented 3 years ago

§4. Badly split lines

The following 45 records need to be re-looked at, either they are part of the prev. line (record), or a part of a group, or have a NULL string, ... ... In any case, they cannot be a new record.

<L>121 <pc>1,2 <k1>aṃhaspati <k2>áṃhas—pati  <e>3 
<L>3294 <pc>16,1 <k1>atisṛjya <k2>ati-sṛjya  <e>2 A
<L>4074 <pc>20,1 <k1>adhonābham <k2>adho-nābham  <e>3 
<L>15905 <pc>90,3 <k1>arthadatta <k2>ártha—datta  <e>3 A
<L>16664 <pc>94,3 <k1>alasa <k2>a-lasa  <e>1 A
<L>19298 <pc>111,3 <k1>avyabhicāra <k2>a-vyabhicāra  <e>1 A
<L>25651 <pc>147,3 <k1>āmra <k2>āmra  <e>1 B
<L>27756 <pc>159,1 <k1>āśvatthika <k2>āśvatthika  <e>2 A
<L>35779.1 <pc>206,3 <k1>upoḍha <k2>upoḍha  <e>2 A
<L>46658 <pc>265,1 <k1>kaśika <k2>kaśika  <e>1 
<L>48456 <pc>274,1 <k1>kāya <k2>kāyá  <e>1 A
<L>59492 <pc>328,3 <k1>kṣiti <k2>kṣití <h>a <e>3 
<L>59553 <pc>329,1 <k1>kṣipra <k2>kṣiprá  <e>2 A
<L>64772 <pc>353,2 <k1>gāna <k2>gāna <h>b <e>2 
<L>64773 <pc>353,2 <k1>gāninī <k2>gāninī <h>b <e>2 
<L>64774 <pc>353,2 <k1>gānīya <k2>gānīya <h>b <e>2 
<L>75288 <pc>403,3 <k1>cyavāna <k2>cyávāna  <e>2 A
<L>75682 <pc>405,2 <k1>chandya <k2>chándya  <e>2 A
<L>81281 <pc>430,1 <k1>ṭuṇṭuka <k2>ṭuṇṭuka  <e>1 B
<L>82052.1 <pc>434,1 <k1>tad <k2>tad  <e>2 B
<L>84402 <pc>444,2 <k1>tāriṇītantra <k2>tāriṇī—tantra  <e>3 A
<L>85069 <pc>447,2 <k1>tiraskāra <k2>tirás—kāra  <e>3 A
<L>94437 <pc>487,2 <k1>durgataraṇī <k2>durgá—taraṇī  <e>3 
<L>94571.2 <pc>487,3 <k1>duṣkṛta <k2>dúṣ—kṛtá  <e>3 B
<L>102655 <pc>523,2 <k1>namuca <k2>ná—muca <h>a <e>3 
<L>106757 <pc>540,2 <k1>nirāmarṣa <k2>nir—āmarṣa  <e>3 
<L>116092 <pc>586,3 <k1>paratattva <k2>pára—tattva  <e>3 
<L>116197 <pc>587,1 <k1>parabhū <k2>pára—bhū  <e>3 
<L>116528 <pc>588,2 <k1>paramāgama <k2>paramāgama  <e>3 
<L>122469 <pc>619,3 <k1>pāraṇa <k2>pāraṇa  <e>2 B
<L>122659 <pc>620,3 <k1>pārāvati <k2>pārāvati  <e>2 A
<L>148702 <pc>747,2 <k1>bharaṇḍa <k2>bharaṇḍa  <e>2 A
<L>149286 <pc>750,1 <k1>bhaṣaṇa <k2>bhaṣaṇa  <e>2 B
<L>152428 <pc>764,2 <k1>bhūriśreṣṭhaka <k2>bhū́ri—śreṣṭhaka  <e>3 
<L>154721 <pc>775,1 <k1>maṇibandhana <k2>maṇí—bandhana  <e>3 A
<L>154994 <pc>776,1 <k1>maṇḍalipattrikā <k2>maṇḍali-pattrikā  <e>2 A
<L>164731 <pc>819,1 <k1>mīvara <k2>mīvara  <e>1 B
<L>167489 <pc>831,2 <k1>mekhalā <k2>mékhalā  <e>2 A
<L>178858 <pc>884,3 <k1>ropa <k2>ropa <h>a <e>2 
<L>179010 <pc>885,3 <k1>roḍhṛ <k2>roḍhṛ <h>a <e>2 
<L>183114 <pc>905,2 <k1>lulāpakāntā <k2>lulāpa—kāntā  <e>3 A
<L>200997 <pc>992,3 <k1>viśva <k2>víśva  <e>1 B
<L>209874 <pc>1038,1 <k1>vyāmana <k2>vyāmana  <e>2 A
<L>214007 <pc>1058,2 <k1>śarman <k2>śárman  <e>1 A
<L>257408 <pc>1273,1 <k1>syada <k2>syáda  <e>2 A
Andhrabharati commented 3 years ago

So are these 159 records. [Rule: a number within a <ls> cannot be at the start of a new record, it goes with the previous one as a continuation.] Noticed that in one of these entries n. (Neuter) is taken as ii. This particular one should be a new record, appropriately changed.

<L>18212 <pc>104,3 <k1>avasa <k2>avasá  <e>2 A
<L>21051 <pc>121,1 <k1>asura <k2>ásura  <e>2 B
<L>23290 <pc>134,1 <k1>āḍhya <k2>āḍhyá  <e>1 A
<L>23368 <pc>134,2 <k1>ātapat <k2>ā-tápat  <e>2 A
<L>23750 <pc>136,2 <k1>ādaṣṭa <k2>ā-daṣṭa  <e>2 A
<L>29966 <pc>171,2 <k1>īṣā <k2>īṣā́  <e>1 A
<L>35754 <pc>206,3 <k1>upavāsa <k2>upa-vāsa  <e>2 A
<L>38569 <pc>224,2 <k1>ṛtupati <k2>ṛtú—páti  <e>3 A
<L>39058 <pc>227,1 <k1>ṛṣi <k2>ṛ́ṣi  <e>1 A
<L>39376 <pc>228,2 <k1>ekanemi <k2>éka—nemi  <e>3 A
<L>40455 <pc>234,1 <k1>aindrāgna <k2>aindrāgná  <e>2 A
<L>42290 <pc>245,1 <k1>kaṇa <k2>káṇa  <e>2 A
<L>44254 <pc>254,1 <k1>karaṇaprayoga <k2>káraṇa—prayoga  <e>3 A
<L>46139 <pc>262,3 <k1>kalpavallī <k2>kálpa—vallī  <e>3 A
<L>46142 <pc>262,3 <k1>kalpaviṭapin <k2>kálpa—viṭapin  <e>3 A
<L>46917 <pc>266,2 <k1>kāṃsya <k2>kāṃsya  <e>2 B
<L>47253 <pc>268,1 <k1>kākṣaseni <k2>kākṣaseni  <e>1 A
<L>50359 <pc>283,1 <k1>kiṃkāryatā <k2>kiṃ—kārya-tā  <e>3 A
<L>52192 <pc>291,2 <k1>kupya <k2>kupya  <e>2 B
<L>54918 <pc>305,2 <k1>kṛpāṇikā <k2>kṛpāṇikā  <e>2 B
<L>54919 <pc>305,2 <k1>kṛpāṇikā <k2>kṛpāṇikā  <e>2 B
<L>55578 <pc>308,3 <k1>kḷpta <k2>kḷptá  <e>2 A
<L>56136 <pc>311,1 <k1>keśari <k2>keśari  <e>2 A
<L>57009 <pc>315,2 <k1>kauṭasākṣya <k2>kauṭa—sākṣya  <e>3 A
<L>57121 <pc>316,1 <k1>kautukāgāra <k2>kautukāgāra  <e>3 A
<L>57187 <pc>316,2 <k1>kaupīna <k2>kaupīna  <e>2 A
<L>57458 <pc>318,1 <k1>kauśāmbī <k2>kauśāmbī  <e>1 B
<L>58113 <pc>321,2 <k1>krayya <k2>kráyya  <e>2 A
<L>58411 <pc>323,1 <k1>kreṅkāra <k2>kreṅ-kāra  <e>1 A
<L>58485 <pc>323,2 <k1>krauñca <k2>krauñcá  <e>1 B
<L>58815 <pc>325,1 <k1>kṣatra <k2>kṣatrá  <e>1 A
<L>59371 <pc>328,1 <k1>kṣitīśvara <k2>kṣitīśvara  <e>3 A
<L>59558 <pc>329,1 <k1>kṣipra <k2>kṣiprá  <e>2 B
<L>59565 <pc>329,1 <k1>kṣipra <k2>kṣiprá  <e>2 B
<L>59566 <pc>329,1 <k1>kṣipra <k2>kṣiprá  <e>2 B
<L>59657 <pc>329,2 <k1>kṣībatā <k2>kṣība—tā  <e>3 A
<L>59819 <pc>330,1 <k1>kṣīraudana <k2>kṣīraudaná  <e>3 A
<L>59820 <pc>330,1 <k1>kṣīraudana <k2>kṣīraudaná  <e>3 A
<L>60109 <pc>331,2 <k1>kṣudhālu <k2>kṣudhālu  <e>2 A
<L>60187 <pc>331,3 <k1>kṣurapra <k2>kṣurá—pra  <e>3 B
<L>60866 <pc>335,1 <k1>khacita <k2>khacita  <e>2 A
<L>60972.25 <pc>335,2 <k1>khaṭvāṅga <k2>khaṭvā—ṅga  <e>3 B
<L>61356 <pc>337,1 <k1>khara <k2>khára  <e>1 B
<L>61647 <pc>338,2 <k1>khalina <k2>khalina  <e>1 A
<L>62124 <pc>341,1 <k1>khyāti <k2>khyāti  <e>2 A
<L>62237 <pc>341,3 <k1>gaṅgādvāra <k2>gáṅgā—dvāra  <e>3 A
<L>62238 <pc>341,3 <k1>gaṅgādvāra <k2>gáṅgā—dvāra  <e>3 A
<L>62385 <pc>342,2 <k1>gajayodhin <k2>gaja—yodhin  <e>3 A
<L>62610.1 <pc>343,2 <k1>gaṇaśas <k2>gaṇá—śás  <e>3 A
<L>62658 <pc>343,3 <k1>gaṇeśvara <k2>gaṇeśvara  <e>3 A
<L>62793 <pc>344,1 <k1>gaṇḍalekhā <k2>gaṇḍa—lekhā  <e>3 A
<L>63104 <pc>345,2 <k1>gandhamādana <k2>gandhá—mādana  <e>3 A
<L>63105 <pc>345,2 <k1>gandhamādana <k2>gandhá—mādana  <e>3 A
<L>63305 <pc>346,2 <k1>gandharvanagara <k2>gandharvá—nagara  <e>3 A
<L>64368 <pc>351,2 <k1>gavāyus <k2>gavāyus  <e>3 A
<L>64390 <pc>351,2 <k1>gavaya <k2>gavayá  <e>2 A
<L>64502 <pc>352,1 <k1>gātuvid <k2>gātú—víd  <e>3 A
<L>65345 <pc>356,2 <k1>guggula <k2>guggula  <e>1 A
<L>65986 <pc>359,2 <k1>gūrti <k2>gūrtí  <e>2 A
<L>66101 <pc>359,3 <k1>gurubha <k2>gurú—bha  <e>3 A
<L>67909 <pc>368,2 <k1>gopānasī <k2>gopānasī  <e>3 A
<L>69065 <pc>374,1 <k1>grāvastut <k2>grāva—stút  <e>3 A
<L>69133 <pc>374,2 <k1>graiṣmika <k2>graiṣmika  <e>2 B
<L>69913 <pc>378,2 <k1>ghoṣila <k2>ghoṣila  <e>2 A
<L>70714 <pc>382,2 <k1>cakṣus <k2>cákṣus  <e>2 A
<L>72117 <pc>388,3 <k1>camūṣad <k2>camū́—ṣád  <e>3 A
<L>72653 <pc>391,2 <k1>calācala <k2>calācalá  <e>2 A
<L>72817 <pc>392,1 <k1>cāturāśramya <k2>cāturāśramya  <e>2 A
<L>72943 <pc>392,3 <k1>cāndrāyaṇa <k2>cāndrāyaṇa  <e>2 B
<L>74114 <pc>398,1 <k1>cintanīya <k2>cintanīya  <e>2 A
<L>74177 <pc>398,2 <k1>cipiṭaghrāṇa <k2>cipiṭa—ghrāṇa  <e>3 A
<L>74194 <pc>398,2 <k1>cipya <k2>cipya  <e>1 B
<L>74580 <pc>400,2 <k1>codaka <k2>codaka  <e>2 B
<L>74747 <pc>401,1 <k1>cūḍākarman <k2>cūḍā—karman  <e>3 
<L>75243 <pc>403,2 <k1>caula <k2>caula  <e>1 A
<L>75483 <pc>404,2 <k1>chadis <k2>chadís  <e>2 A
<L>75710 <pc>405,2 <k1>chardis <k2>chardís  <e>1 A
<L>75786 <pc>405,3 <k1>chāgaleya <k2>chāgaleya  <e>2 B
<L>76020 <pc>407,1 <k1>cheda <k2>cheda  <e>2 B
<L>76091 <pc>407,2 <k1>chorita <k2>chorita  <e>2 A
<L>76576 <pc>409,2 <k1>jaṭāla <k2>jaṭāla  <e>2 A
<L>76686 <pc>409,3 <k1>jaḍāśaya <k2>jaḍāśaya  <e>3 A
<L>76972 <pc>411,1 <k1>janiman <k2>jániman  <e>2 A
<L>76976 <pc>411,1 <k1>janiman <k2>jániman  <e>2 A
<L>77161 <pc>412,1 <k1>japahoma <k2>jápa—homa  <e>3 A
<L>77592 <pc>413,3 <k1>jayus <k2>jayús  <e>2 A
<L>77594 <pc>413,3 <k1>jayya <k2>jáyya  <e>2 A
<L>78234 <pc>416,1 <k1>jaleśvara <k2>jaleśvara  <e>3 A
<L>78257 <pc>416,1 <k1>jalaukas <k2>jalaukas  <e>3 B
<L>78419 <pc>416,3 <k1>jāṃhāgiranagara <k2>jāṃhāgira—nagara  <e>3 A
<L>78734 <pc>418,2 <k1>jātīphala <k2>jātī—phala  <e>3 A
<L>78878 <pc>419,1 <k1>jābālopaniṣad <k2>jābālopaniṣad  <e>3 A
<L>78953 <pc>419,2 <k1>jāmbudvīpaka <k2>jāmbudvīpaka  <e>1 A
<L>78954 <pc>419,2 <k1>jāmbudvīpaka <k2>jāmbudvīpaka  <e>1 A
<L>79569 <pc>422,2 <k1>jīraka <k2>jīraka  <e>3 A
<L>79649 <pc>422,3 <k1>jīvakośa <k2>jīvá—kośa  <e>3 A
<L>79748 <pc>423,1 <k1>jīvaśarman <k2>jīvá—śarman  <e>3 A
<L>80064 <pc>424,2 <k1>juhoti <k2>juhoti  <e>2 A
<L>80080 <pc>424,2 <k1>jūjuvas <k2>jūjuvás  <e>2 A
<L>80173 <pc>424,3 <k1>jeya <k2>jeya  <e>2 A
<L>80180 <pc>424,3 <k1>jenyāvasu <k2>jenyā-vasu  <e>3 A
<L>80678 <pc>427,2 <k1>jyotiḥśāstra <k2>jyotiḥ—śāstra  <e>3 A
<L>80695 <pc>427,2 <k1>jyotirgarga <k2>jyotir—garga  <e>3 A
<L>80701 <pc>427,2 <k1>jyotirnirbandha <k2>jyotir—nirbandha  <e>3 A
<L>81269 <pc>430,1 <k1>ṭīṭibhī <k2>ṭīṭibhī  <e>1 B
<L>82100 <pc>434,2 <k1>tadguṇa <k2>tád—guṇa  <e>3 B
<L>82101 <pc>434,2 <k1>tadguṇa <k2>tád—guṇa  <e>3 B
<L>82204 <pc>434,3 <k1>tanmātra <k2>tan—mātra  <e>3 B
<L>82394 <pc>435,3 <k1>tanūpāna <k2>tanū́—pā́na  <e>3 B
<L>82556 <pc>436,2 <k1>tantrin <k2>tantrin  <e>2 B
<L>82703 <pc>437,1 <k1>tapas <k2>tápas  <e>2 A
<L>82768 <pc>437,2 <k1>tapojā <k2>tapo—jā́  <e>3 A
<L>82946 <pc>438,2 <k1>tamisrapakṣa <k2>támisra—pakṣa  <e>3 A
<L>83179 <pc>439,1 <k1>tarutṛ <k2>tarutṛ́  <e>2 A
<L>83264 <pc>439,2 <k1>tarucchāyā <k2>taru—cchāyā  <e>3 A
<L>83535 <pc>440,3 <k1>talātala <k2>talātala  <e>3 A
<L>83551 <pc>440,3 <k1>talin <k2>talin  <e>2 A
<L>83622 <pc>441,1 <k1>taviṣa <k2>taviṣá  <e>2 B
<L>83650 <pc>441,2 <k1>taṣṭṛ <k2>táṣṭṛ  <e>2 A
<L>83716 <pc>441,3 <k1>tāḍāvacara <k2>tāḍāvacara  <e>3 A
<L>84019 <pc>443,1 <k1>tāmasakīlaka <k2>tāmasa—kīlaka  <e>3 A
<L>84324 <pc>444,1 <k1>tārakāmaya <k2>tārakā—maya  <e>3 A
<L>84328 <pc>444,1 <k1>tārakārāja <k2>tārakā—rāja  <e>3 A
<L>85398 <pc>448,3 <k1>tīkṣṇa <k2>tīkṣṇá  <e>1 A
<L>85560 <pc>449,1 <k1>tīrṇapratijña <k2>tīrṇa—pratijña  <e>3 A
<L>85731 <pc>449,3 <k1>tugākṣīrī <k2>tugā—kṣīrī  <e>3 A
<L>85735 <pc>449,3 <k1>tugra <k2>túgra  <e>1 A
<L>85819 <pc>450,1 <k1>tujya <k2>tújya  <e>2 A
<L>85876 <pc>450,2 <k1>tutthaka <k2>tutthaka  <e>2 A
<L>86045 <pc>451,1 <k1>turaṇyu <k2>turaṇyú  <e>3 A
<L>86254 <pc>452,1 <k1>tuviṣvan <k2>tuví—ṣván  <e>3 A
<L>87172 <pc>455,3 <k1>tomaragraha <k2>tomara—graha  <e>3 A
<L>87248 <pc>456,1 <k1>toyikā <k2>toyikā  <e>2 A
<L>87659 <pc>458,2 <k1>trigaṅga <k2>trí—gaṅga  <e>3 A
<L>87911 <pc>459,1 <k1>triparivarta <k2>trí—parivarta  <e>3 A
<L>87944 <pc>459,2 <k1>tripiṭa <k2>trí—piṭa  <e>3 A
<L>88271 <pc>460,2 <k1>triviṣṭabdhaka <k2>trí—viṣṭabdhaka  <e>3 A
<L>88611 <pc>462,1 <k1>truṭi <k2>truṭi  <e>2 A
<L>88664 <pc>462,1 <k1>traigartaka <k2>traigartaka  <e>3 A
<L>88828 <pc>463,1 <k1>tryaṅgula <k2>try—aṅgulá  <e>3 A
<L>89146 <pc>464,2 <k1>tviṣ <k2>tvíṣ  <e>2 A
<L>89305 <pc>465,1 <k1>daṃsiṣṭha <k2>dáṃsiṣṭha  <e>2 A
<L>89386 <pc>465,2 <k1>dakṣas <k2>dákṣas  <e>2 A
<L>89575 <pc>466,3 <k1>daṇḍakamaṇḍalu <k2>daṇḍá—kamaṇḍalu  <e>3 A
<L>90549 <pc>470,3 <k1>darman <k2>darmán  <e>3 A
<L>90801 <pc>471,2 <k1>davīyas <k2>dávīyas  <e>3 A
<L>90839 <pc>471,3 <k1>daśagrīva <k2>daśa—grīva  <e>3 A
<L>91100 <pc>473,1 <k1>daśā <k2>daśā  <e>1 A
<L>91946 <pc>477,1 <k1>dāsabhārya <k2>dāsá—bhārya  <e>3 
<L>92882 <pc>481,2 <k1>dīpta <k2>dīpta  <e>2 A
<L>106268 <pc>538,2 <k1>nāsikaṃdhama <k2>nāsika—ṃ-dhama  <e>3 A
<L>108172 <pc>546,2 <k1>nighuṣṭa <k2>ni-ghuṣṭa  <e>1 
<L>132910 <pc>669,2 <k1>pratiyatna <k2>prati-yatna  <e>3 A
<L>145399 <pc>732,2 <k1>bījakāṇḍaruha <k2>bī́ja—kāṇḍa-ruha  <e>3 
<L>166218 <pc>825,3 <k1>mūtrapurīṣa <k2>mū́tra—purīṣa  <e>3 A
<L>185668 <pc>917,1 <k1>vadhū <k2>vadhū́  <e>1 A
<L>205101 <pc>1012,3 <k1>vṛṣabhaṣoḍaśā <k2>vṛṣabhá—ṣoḍaśā  <e>3 B
<L>213636 <pc>1057,1 <k1>śaryaṇāvat <k2>śaryaṇā-vat  <e>2 A
<L>254890 <pc>1260,2 <k1>steyakṛt <k2>stéya—kṛ́t  <e>3 A
Andhrabharati commented 3 years ago

Like this, plenty of issues are seen in the data. Hope they would be resolved sooner (now that @Andhrabharati is there to point out those).

The interest is to make the data as much correct and uniform (consistent in structure) as possible, keeping the intent of the book (Author) in mind.

funderburkjim commented 3 years ago

@Andhrabharati I see the excel file you submitted. This makes life more difficult. sigh.

AFAIK, git treats excel files (.xslx) as binary. This means the normal command-line tools (diff, 'git diff') are not applicable.

I'll try to find a way to make use of the file, by

  1. writing a program to first convert the .xslx file to a tab-delimited file
  2. undoing the tabs into new lines
    • at this point, I hope the file will be comparable to mw_iast.txt
  3. Converting to slp1.
    • Then, I hope the result will be comparable to mw.txt

Then, finally, maybe, I'll be able to review and make use of the changes you have made, and will be making further.

funderburkjim commented 3 years ago

.xslx is treated as binary file -- confirmed by viewing the commit at Github:


gasyoun commented 3 years ago

.xslx is treated as binary file

No good, I guess?

finally, maybe, I'll be able to review and make use of the changes you have made, and will be making further.

Sounds like a big plan.

Hope they would be resolved sooner (now that @Andhrabharati is there to point out those).

Exactly, now he's part of the Cologne core team. Thanks, it's good to have you around.

The interest is to make the data as much correct and uniform (consistent in structure) as possible, keeping the intent of the book (Author) in mind.

One can't have nothing against it.

n. (Neuter) is taken as ii.

Good small catch. Have not seen such cases recently.

Andhrabharati commented 3 years ago

@funderburkjim Don't spend any of your time in converting this excel file.

It is just a split of the data into meaningful parts, for better/easier identification of issues.

And there are no modifications made in it.

I was just listing the observations looking at this file.

funderburkjim commented 3 years ago

I've already spent an hour trying to convert to tsv with google docs.

Will stop.

Andhrabharati commented 3 years ago

When I actually do some modification, will be pushing the mw_iast file you prepared.

The pushing of excel file is just a trial to see the operation.

drdhaval2785 commented 3 years ago

@funderburkjim, doesn't help in getting tsv from xlsx?

Andhrabharati commented 3 years ago

I was thinking of inserting the HW endings [when time comes for it] before the gender info (about which I was talking aloud all these days), within flower brackets { }.

So looked if these characters were already used in the data. Found 4 such records.

§5. { } used extraneously '{' in 3 records and '}' in 4 records-

<L>19258<pc>111,2<k1>avyaṅga<k2>avyaṅga<h>2<e>1 <hom>2.</hom> <s>avyaṅga</s> ¦ <lex>mn.</lex> the girdle of the <s1 slp1="maga">Maga</s1> priests, <ls>BhavP. i</ls>; (<s>viyaṅga</s> or <s>viyāṅga</s>), <ls>VarBṛS.</ls> <info lex="m:n"/>    <LEND>
;; removed the { } around ';' as per the book.

<L>40647<pc>235,2<k1>ojodā<k2>ojo—dā́<e>3   <s>ojo—dā́</s> ¦ <lex>mfn.</lex> granting power, strengthening, <ls>RV. viii, 3, 24</ls>; <ls>TS. v</ls>; <info lex="m:f:n"/>   <LEND>
;; removed the { } around ';' as per the book.

<L>90352<pc>470,1<k1>dayāvat<k2>dayā́—vat<e>3   <s>dayā́—vat</s> ¦ <lex>mfn.</lex> pitiful, taking pity on (<ab>gen.</ab> <ls>MBh. xiii</ls>; <ab>loc.</ab>, ii; <ls>R. ii</ls>)<info lex="m:f:n"/> <LEND>
;; removed the { } around ';' & changed ',' to ';' between ii and R. as per the book.

<L>98507<pc>505,1<k1>dvinārāśaṃsa<k2>dvi—nārāśaṃsa<e>3  <s>dvi—nārāśaṃsa</s> ¦ <lex>mf(<s>ī</s>)n.</lex> twice furnished with the vessels called <ab n="Nārāśaṃsa" slp1="nArASaMsa">N°</ab>, <ls>AitBr.</ls><info lex="m:f#I:n"/>   <LEND>
;; changed '}' to a ',' as per the book.

Now these { } are available for me to insert the HW-endings into this MW data. @------------------- The corrections phase has started now!!

I am shortly pushing the mw_iast_AB_1.txt file, with the above 4 records (as in §5) changed and also 70k+ 'double space's replaced with 'single space's.

[mw_iast_AB_0.txt file is the merging of 3 lines of each <L> record mw_iast.txt file into a single line (\r\n replaced with \t), as was done earlier in the corrections_9.txt file from you. And I thought, this may not be of any interest to give. Of course, it was posted in alternate form (MW_IAST_0.xlsx) already.]

This also has the following corrections done afterwards-

There are ;; remarks where ever I felt were needed. [This is how I would be doing the rest of the work, as I go on.]

A big feast to your mind (to go through and verify) for some time!!

Andhrabharati commented 3 years ago

You may notice that I have added the actual Annexure text and Main text, as R(p,c) and O(p,c) at the ;; lines. This is to make the reference to book content 'local' (without the need to open the scan file and navigate thereupon).

With this, now I await for the feedback- if this way of working is alright for you.

Now the file is available for your perusal.

I do not wish to know the tagging process (to do it myself), as I have no use of it for my other activities. Would leave it to you (and the other members) to do that piece of work (anyway you'd be reviewing those lines) in the revised/added lines, looking at my remarks.

Andhrabharati commented 3 years ago

Another variety of issue! §6. Some examples of the extremely rare cases where the HW endings were "typed" [but very badly applied/treated!]

<L>40093.1 <pc>231,3 <k1>ema <k2>ema <e>2 E <L>40093 <pc>231,3 <k1>ema <k2>éma <h>a <e>2 image

<L>40094 <pc>231,3 <k1>eman <k2>éman <e>2 image

The book has it thus- image

These two are to be grouped- as they are comma separated and also as indicated thus at- image

एम (Nom. एमम्), एमन् (Nom. एमन) are the two entries as a group here.

Guess this example is clear enough to prove my understanding, that these endings denote the nominative forms as well as variant forms (if comma separated).

@------------------------ <L>28372 <pc>162,3 <k1>āhuka <k2>āhuka <e>1 B image

The book content for this is- image

The pl. entry should have been āhukā [even if the last s indicating the visarga (आहुकाः) is ignored, as is the case everywhere].

The word ending if in braces is to be applied to the HW invariably as seen here, which I was mentioning all along.

Andhrabharati commented 3 years ago

These were immediately identified looking at the excel file (with different ordering/sorting ways, an inherent tool here)- image

gasyoun commented 3 years ago

The word ending if in braces is to be applied to the HW invariably as seen here, which I was mentioning all along.

I seem to start loving your love to details and your start to understand how to document them the proper way.

Andhrabharati commented 3 years ago

Looks like I should not use # to refer to my points, as it has some other purpose here at Github.

Shall be using the character § henceforth.

Andhrabharati commented 3 years ago

I seem to start loving your love to details

We strive to understand the author well enough, before venturing to work on his work (at Andhrabharati). [A simple & fair enough rule in our working.]

your start to understand how to document them the proper way

There is nothing new for me to understand "how to document" now; from my childhood I was doing this way and being appreciated by one and all.

funderburkjim commented 3 years ago

First look at mw_iast_AB_1.txt.

big difference via diff

diff mw_iast_AB.txt mw_iast_AB_1.txt | wc -l => 1167572

Same result with diff -w (ignoring white-space differences).

Conclusion: There is some systematic difference in formatting.

big difference in number of lines

wc -l mw_iast_AB_1.txt => 287500 (number of lines)

wc -l mw_iast.txt => 880079

file sizes comparable

wc -c mw_iast.txt => 53839145

wc -c mw_iast_AB_1.txt => 53747329

AB_1 has about 100,000 less characters. I actually would have expected more.

difference is in tabbing

Looking at mw_iast_AB_1.txt in emacs suggests it is AB's tab format, rather than the line format of mw_iast.txt. Perhaps the tab-format can be undone by a program. Will try.

Andhrabharati commented 3 years ago

mw_iast_AB_0.txt file is the merging of 3 lines of each <L> record mw_iast.txt file into a single line (\r\n replaced with \t), as was done earlier in the corrections_9.txt file from you. And I thought, this may not be of any interest to give. Of course, it was posted in alternate form (MW_IAST_0.xlsx) already.

This is my way of arrangement, which you had also seen earlier!

funderburkjim commented 3 years ago

How do I recover the original format?

Andhrabharati commented 3 years ago

you just need to do reverse replacement, tab with crlf, in any editor or though any script.

Andhrabharati commented 3 years ago

I did not do any major correction, to have more reduction in character count.

All the corrections done are reported in my above message. You might go through it once, so as to understand the changes easily.

funderburkjim commented 3 years ago

result of changing tabs to line breaks.

There are now 862384 lines, compared to 880079 in mw_iast.txt (17,695 fewer lines than in the original).

For example, in the entry <hom>1.</hom> <s>aṃh</s>, the original mw_iast.txt has several lines between the <L>... line and the <LEND> line:

< <hom>1.</hom> <s>aṃh</s> ¦ (<ab>cf.</ab> √ <s>aṅgh</s>) <ab>cl.</ab> 1. <ab>Ā.</ab> <s>aṃhate</s>, to go, set out, commence, <ls>L.</ls>;
< <div n="to"/>to approach, <ls>L.</ls>;
< <div n="to"/> <ab>cl.</ab> 10. <ab>P.</ab> <s>aṃhayati</s>, to send, <ls>Bhaṭṭ.</ls>;
< <div n="to"/>to speak, <ls>Bhaṭṭ.</ls>;
< <div n="to"/>to shine, <ls>L.</ls><info verb="genuineroot" cp="1Ā,10P"/>

but these line breaks are lost in mw_iast_ab_1.txt -- which joins all these together:

> <hom>1.</hom> <s>aṃh</s> ¦ (<ab>cf.</ab> √ <s>aṅgh</s>) <ab>cl.</ab> 1. <ab>Ā.</ab> <s>aṃhate</s>, to go, set out, commence, <ls>L.</ls>;<div n="to"/>to approach, <ls>L.</ls>;<div n="to"/> <ab>cl.</ab> 10. <ab>P.</ab> <s>aṃhayati</s>, to send, <ls>Bhaṭṭ.</ls>;<div n="to"/>to speak, <ls>Bhaṭṭ.</ls>;<div n="to"/>to shine, <ls>L.</ls><info verb="genuineroot" cp="1Ā,10P"/>

Can you do something about this? -- e.g., when you construct your tabbed file from mw_iast.txt, you could add text <LB> where there are line breaks.

funderburkjim commented 3 years ago

Also, leave the 15 blank lines in your tabbed file (such as the blank lines in mw_iast.txt preceding <L>16764.2< .)

The reason for these requests is so that I can identify what 'real' changes you are making.

Andhrabharati commented 3 years ago

I have talked about this line count difference in my §1. above clearly.

I do not understand why you need those 15 blank lines back. I would instead suggest you to remove those lines in the Koeln file(s) themselves!

For the other 17k+ lines to be reverted, two replacements are to be done- (i) $ to be replaced with crlf. (2500+ lines) (ii) "<div" lines to be got by inserting "crlf" before the "<div". (15k+ lines)

Simple process.

Looks like you are NOT reading my messages, but just trying to directly look at the files (being) sent.

funderburkjim commented 3 years ago

I'll try using your suggestions (i) and (ii) tomorrow.

Yes, I am trying to process the mw_iast_AB_1.txt file.

Recall that you said When I actually do some modification, will be pushing the mw_iast file you prepared.

So I was not expecting to have to do so many steps to get a file comparable to the mw_iast.txt file.

Andhrabharati commented 3 years ago

I thought the messages (detailing the changes) and the file (with incorporated changes) would go together!

Andhrabharati commented 3 years ago

And I recall that when I initially posted the change details alone, you had tried to look into the (Excel) file to "see" them.

Also there are not that "many steps" involved to get a comparable file- just three simple find/replace operations, not taking more than a couple of seconds' time (if only my messages were really "read" once)!!

gasyoun commented 3 years ago

(if only my messages were really "read" once)

It's not about reading. You have given so many files - it's rather hard to understand what is what for and what has been done in each.

Andhrabharati commented 3 years ago

It's not about reading. You have given so many files - it's rather hard to understand what is what for and what has been done in each.

Am really surprised to see this remark.

I have given just two files, one Excel file before- without any "modifications" done, which I said Jim might not be able to use; and one mw_iast_AB_1.txt file now- with "modifications" done, which is meant for Jim's perusal.

And I clearly mentioned about both the files- of what they contain.

Andhrabharati commented 3 years ago

If this remark is regarding my postings/messages, I will STOP those henceforth. (No point spending my time on such use-less/mis-leading/confusing things.)

gasyoun commented 3 years ago

If this remark is regarding my postings/messages, I will STOP those henceforth.

Please do not run so fast. Give us time to understand. What you write is of highest interest, but there are too many aspects involved at once.

funderburkjim commented 3 years ago


Have been able to derive a file comparable to mw_iast.txt from mw_iast_AB_1.txt file.

And have begun comparisons so as to understand the changes you have made. The bulk of your changes are quite reasonable.

The line counts in comments below pertain to file in format of mw_iast.txt.

good corrections noted thus far

problems in metaline

At this point I had reduced the number of unknown differences considerably, from about 100k to 1,300. Now I begin to examine some of the remaining differences.

The first thing that happened to catch my eye involves the metaline

this is the line the begins each entry. Its format is <L>x<pc>y<k1>z<k2>w<h>u<e>v With <h> parameter being optional for MW,

There are several errors in mw_iast_1_AB.txt that have been introduced. By my count, there are 593 erroneous metalines. A list of them is in file metaline_prob_ab_1.txt

The problems fall into 5 categories:

Would you correct these?

I hope you will make the corrections requested above to mw_iast_AB_1.txt and then push the new file up to Github. Then I will continue the initial analysis.

Andhrabharati commented 3 years ago

problems in metaline

I had already noticed & corrected these and updated the file.

Probably you happened to use the initial file.

Anyway, will update the file again with the other ones pointed by you.

This time, I think I should start giving the file directly in your "comparable form" itself, a simple task for me, saving much of our times (yours in interpreting & mine in explaining).

funderburkjim commented 3 years ago

will update the file again


file directly in your "comparable form"

No need. If you maintain current formatting details of your 'tab' file. I now have a program that can convert your 'tab' file to the 'comparable form'. But thanks for the offer.

Andhrabharati commented 3 years ago

No need. If you maintain current formatting details of your 'tab' file.

But what about the 17k+ lines with non-<L> beginnings? Probably I should take that your (pre-processor) program handles those as well- looking at your statement "No need". So, I will stick to my format only.

Andhrabharati commented 3 years ago

Added another piece of change- §7. &c. to be with a space before. &c. (and also &) is always to precede with a space (except at the start of a braced content) : ~50 occurrences &c. inside <ls>...</ls> is taken out to be after </ls> : ~3900 occurrences

Now the file is mw_iast_AB_2.txt (with this §7 correction & the corrections as mentioned by @funderburkjim implemented). [Sorry that I am progressing further, before the 1st file content is accepted.] [Reminder: §4 and §6 are yet to be taken up]