aFem Manual Processing, Part 5 (MW)

gasyoun commented 8 years ago

goṇa [L=67803] -> goṇā
goṇa [L=67804] -> goṇī

gona

gauḍa [L=68089] -> gauḍī

gauda

ghuṭika [L=69791] -> ghuṭikā

ghutika

gasyoun commented 8 years ago

@drdhaval2785 goṇa [L=67804] -> goṇī automatically generated for the HTML list would help. Otherwise the HTML is useless - it's not sorted by dictionaries (I check 1 dictionary at the time) and it has no help of chunks for copypaste like goṇa [L=67804] -> goṇī. Even if it's goṇa [L=67804] -> goṇā it's an easier fix than retyping it all. And more - it can be in SLP1 here. For humans IAST is good. For PC - SLP1. If I copypaste from the dictionary, there is dirt like page number, which I delete every time. But guess the task is too small to optimize it.

gasyoun commented 8 years ago

ghurghuraka [L=69835] -> ghurghurikā (OCR error)

ghur

cukrā* mla [p= 400,1] [L=74528] -> cukrAmlA

[p= 400,1] -> [p= 399,3] [L=74528](neutrum word starts on previous page than stated)

cukra cukra2

cukraka [p= 400,1] [L=74534] -> cukrikA

cukraka

funderburkjim commented 8 years ago

@gasyoun Are there going to be more items under this issue for MW? If not, I'll install, otherwise, I'll wait.

gasyoun commented 8 years ago

Midway upon the journey of our life, I found myself within a forest dark, For the straight foreward pathway had been lost. So I continue.

funderburkjim commented 8 years ago

@gasyoun My interpretation of your poem is that this is NOT ready for installation yet.

gasyoun commented 8 years ago

mw:cauraka,75221:caurikA:t:

cauraka

gasyoun commented 8 years ago

mw:chinna,75979:chinnA:t:

chinna

gasyoun commented 8 years ago

mw:janaSruta,76811:janaSrutA:t: ~~76811~~ 76813 (ejf)

janasruta

gasyoun commented 8 years ago

mw:jantuka,77030:jantukA:t: jantuka

82 more lines left. Slow it is.

drdhaval2785 commented 8 years ago

Shouldnt be cOrikA? On 19 Dec 2015 03:03, "Marcis Gasuns" notifications@github.com wrote:

mw:cauraka,75221:caurikA:t:

[image: cauraka] https://cloud.githubusercontent.com/assets/80761/11907806/182aa926-a5e8-11e5-8244-363b33382a56.PNG

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/174#issuecomment-165902319 .

drdhaval2785 commented 8 years ago

CinnA? On 19 Dec 2015 03:05, "Marcis Gasuns" notifications@github.com wrote:

mw:chinna,75979:chinnA:t:

[image: chinna] https://cloud.githubusercontent.com/assets/80761/11907851/5d86c9d2-a5e8-11e5-8947-f5eda34e68b6.PNG

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/174#issuecomment-165902641 .

gasyoun commented 8 years ago

I do not think it's f., but it's an cf..

jambha [L=77208] f. cf. ku-, tapur., tigma-, tṛṣṭa-, vīlu- -> cf. ku-, tapur., tigma-, tṛṣṭa-, vīlu-
jambha [L=77209] f. su-jambha and antar-jambha ([cf. γαμφηλαί .]) -> cf. su-jambha and antar-jambha ([cf. γαμφηλαί .])

jambha

Sarcophagidae commented 8 years ago

Let's move [L=78931] f. = °bavī L. (=jāmbavatabavī) under jāmbavatī.

jamba

evsyukov commented 8 years ago

ḍimbha [L=81435] f. cf. toya-. -> ḍimbha [L=81435] cf. toya-.

qimba

Sarcophagidae commented 8 years ago

[L=81666] f. cf. a-, ut- -> cf. a-, ut- [L=81667] f. pura-taṭī. -> cf. pura-taṭī. tawa

evsyukov commented 8 years ago

Mb it'd be better move tapana--tanaya [L=82675] f. = °pantī W. tapana--tanaya [L=82676] f. = °pasvī*ṣṭā L.

TO

tapana--tanayā [L=82674] f. = -sutā L.

tapanatanaya

evsyukov commented 8 years ago

taraṁgaka [L=83064] f. cf. nārī-. -> taraṁgaka [L=83064] cf. nārī-.

taramgaka

gasyoun commented 8 years ago

Maybe taraṁgikā [L=83064] cf. nārī-., @drdhaval2785 ?

drdhaval2785 commented 8 years ago

taraNgikA is the correct word. And I welcome new friends.

Sarcophagidae commented 8 years ago

I'm glad to meet you too)

Lets move [L=264297] f. = svarṇakṣīrī L. under hemāhvā .

I think [L=264297] is a part of hemāhvā [L=264296] article.

hemahva

Sarcophagidae commented 8 years ago

Lets move [L=260121] f. w.r. for haṁsa-padā under haṁsapādā. I think [L=260121] is a part of haṁsapādā [L=260120] article.

hamsapada

evsyukov commented 8 years ago

Nice to meet you ) MB, tarjana [L=83436] f. = °nikā Hcat. ii, 1. is the part of tarjanī [L=83435] f. " threatening finger ", the fore-finger Kathās. xvii, 88 KātyṠr. Sch. ?

tarjana

gasyoun commented 8 years ago

@evsyukov tarjanA and tarjanI are already there as real entries. What is the correction?

evsyukov commented 8 years ago

MB, tāpa [L=83891] f. cf. paścāt-. is the part of tāpī a [L=83890] f. the Tapti river (" also the yamunā river " L. ) Hariv. ii, 109, 30 BhP. v, 19, 18 ; x, 79, 20 ? tapa

Sarcophagidae commented 8 years ago

Lets move L=256618 under L=256617 I think [L=256618] f. (a word of unknown meaning) Hariv. 10243. is the part of sparśā [L=256617] f. an unchaste woman L. article sparsa

funderburkjim commented 8 years ago

Re goṇa [L=67803] -> goṇā

This L corresponds to (ifc. after numerals °णि) Pāṇ. 1-2, 50 Kāṡ.

Since the preceding and following parts are feminines ending in 'I', this one can't end in 'A'. I think it is saying that when goRI is used at the end of adjective compounds then it is spelled 'goRi', such as daSagoRi and paYcagoRi. PWG under 'goRI' makes the same point, as does the Panini reference. So, probably the best thing is to make the headword 'goRI',

funderburkjim commented 8 years ago

cukrā* mla [p= 400,1] [L=74528] -> cukrAmlA

74528 is . vinegar made of the Garcinia fruit L., so 74528 is ok The wrong ones are

74530: = °kra-caṇḍikā
74531: = °kra-vedhaka

funderburkjim commented 8 years ago

mw:cauraka,75221:caurikA:t:

Agree, and also,

mw:cOrakayA,75222:cOrikayA:t: since instrumental of preceding cOrikA

funderburkjim commented 8 years ago

I do not think it's f., but it's an cf.. (under jamBa)

Agree. there are headwords kujamBa, etc. in MW.

Here, right after a jamBA and a jamBI headword, dictionary seems to be talking about compounds ending in jamBa. I'll make the change to say <lex>m.</lex>' rather thanf.`

funderburkjim commented 8 years ago

(=jāmbavatabavī) (under jAmbavata)

Hi, @Sarcophagidae Welcome!

Here's the standard form change --- I think it agrees with your suggestion:

mw:jAmbavata,78931:jAmbavatI:t:

A secondary point. I was confused by jāmbavatabavī I think that = °bavī instead implies jAmbavI (which is a headword).

funderburkjim commented 8 years ago

ḍimbha [L=81435] f. cf. toya-. -> ḍimbha [L=81435] cf. toya-.

@evsyukov Greetings. Thanks for contributing!

At first, I thought you meant -> ḍimbhā.

But there is a headword 'toyaqimBa' (short 'a'); so this seems like the jamBa (77208) example, where the text is reverting to the initial headword in pointing out a compound. So the error is the f. gender. Agree?

funderburkjim commented 8 years ago

[L=81666] f. cf. a-, ut- -> cf. a-, ut- [L=81667] f. pura-taṭī. -> cf. pura-taṭī.

I agree with 81666, since there are headwords atawa and uttawa, so the headword for 81666 should be tawa and the error is in the gender.

However, similar reasoning would lead me to think that, since a feminine pura-tawI is being mentioned, the headword of 81667 should be tawI.
@Sarcophagidae Agree?

gasyoun commented 8 years ago

I was confused by jāmbavatabavī I think that = °bavī instead implies jAmbavI (which is a headword).

@drdhaval2785 I was thinking that °bavī could be jāmbavatabavī just because there is no such headword.

gasyoun commented 8 years ago

@funderburkjim many of the samples point out a pattern of common mistakes. In MW entries first come common, than f. forms, after cf. - words for comparison. So most cf. have f. attribute, but should have none. Agree?

funderburkjim commented 8 years ago

@gasyoun Your suggestion is a good working hypothesis. It may help us understand some of the cases represented in this issue.

gasyoun commented 8 years ago

The case is there are 80 more similar cases. So some batch Jim-side update would not hurt, before we do some foolish work.

funderburkjim commented 8 years ago

Re taraMgaka,83064

Following the reference nArI-X in MW we find nArI-taraMgaka ,m. a libertine, catamite

So, by the logic of the other 'cf' cases, I agree that the correction is to remove the 'f.', as @evsyukov suggests.

funderburkjim commented 8 years ago

@gasyoun Where is the list of words this issue is working from?

funderburkjim commented 8 years ago

@gasyoun tarjana [L=83436] f. = °nikā Hcat. ii, 1. For this one, the headword needs to be corrected to tarjanI, as @evsyukov suggests.

funderburkjim commented 8 years ago

MB, tāpa [L=83891] f. cf. paścāt-.

Since the reference is to paścā́t-tāpa m. , I think the correction is change 'f' to 'm': tāpa [L=83891] m. cf. paścāt-.

drdhaval2785 commented 8 years ago

https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/afem/afem.sh is the code and https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/afem/afem.txt is the word list, Jim.

funderburkjim commented 8 years ago

Thanks, @drdhaval2785 . I'm in midst of trying another approach, list has 161 cases. Am aiming to make a useful extract of MW to simplify the work.

funderburkjim commented 8 years ago

Here's some additional working materials for this study.

afem1mw_standard.txt This has 161 cases for mw where the headword ends in 'a' and there is an indication of feminine gender in the entry.
- The records are in the 'standard form' for corrections
- There is a notation on the record of 'TODO' or 'DONE'. The ones marked as 'DONE' are those presented earlier in the comments of this issue.
- The records are in dictionary order (increasing 'L')
- For those records identified as a 'cf', there is an indication (CF).
- SImilarly, there is an (IFC) (at the end of compounds) where this is present. Probably these do NOT need to be corrected.
afem1mw_disp.md is a display for each of the cases.
- There is a link to the scanned image page (or pages)
- There is the record from the above standard-form file
- There is a textual form of the MW display for the headword. Within this the line in question is in bold.
- The Sanskrit is rendered in IAST form.
afem1mw_disp.org is a similar display, but formatted for the Emacs Org mode.

For any of us working on this, I suggest we go through in the order of these files, and enter in a comment of this issue, batches of the corrected standard form corrections. We'll have to be careful not to trip over each other if several people are working on this.

Comments?

funderburkjim commented 8 years ago

Cases 1-6:

1. mw:ajihvaka,2169.1:ajihvaka:t: This is complex. It is a parenthetical headword from within jihvika headword. Solution: make two records:
  - 2169.1 a-jihvaka : mfn., »jihvaka 79497, " tongueless "
  - 2169.2 a-jihvikA : f., N. of a rākṣasī [NEW RECORD]
2: mw:anavadAnIya,5295.1:anavadAnIya:p: f. -> mfn. compare PD
3: mw:aBijAtIya,11386:aBijAtIya:n: (IFC) false positive.
4 aBimarza. The preceding record needs to have 'ifc. f#A' added. And current record needs f. -> mfn.
- mw:aBimarza,12103.41:aBimarza:t: ifc. f#A
- mw:aBimarza,12103.51:aBimarza:t: f -> mfn.
1. mw:alaka,16523:alakA:t:
1. mw:alaka,16524:alakA:t:

funderburkjim commented 8 years ago

Cases 7-10.

7 mw:avasaBa,18261:avasaBa:p: change to mfn, compare mw72.
8 mw:ekaDana,39344:ekaDanA:t: Also, 'f. pl.' -> 'f. pl. (As)'
9 mw:kalANgala,45904:kalANgala:n: (IFC)
10 mw:kusita,53567:kusitA:t:

Also, the preceding record has the wrong H-code:
- mw:kusitA,53566:kusitA:t: H-code wrong. Change H2B to H1B.

funderburkjim commented 8 years ago

Cases 11-17

11 mw:kzipaka,59511:kzipakA:t:
12 mw:gira,65073:gira:n: (IFC)

The next 5 were discussed above

13 mw:goRa,67803:goRI:t: (IFC)
14 mw:goRa,67804:goRI:t: (CF)
15 mw:gOqa,68089:gOqI:t:
16 mw:Guwika,69791:GuwikA:t:
17 mw:GurGuraka,69835:GurGurikA:t:

funderburkjim commented 8 years ago

Cases 18-25

18 mw:GrARapuwaka,70259:GrARapuwaka:n: (IFC)
19 mw:cakravAka,70504:cakravAkI:t:
20 mw:caturTa,71351:caturTI:t:
21 mw:caturTa,71352:caturTI:t:

The next 4 were discussed above

22 mw:cukrAmla,74530:cukrAmlA:t:
23 mw:cukrAmla,74531:cukrAmlA:t:
24 mw:cukraka,74534:cukrikA:t:
25 mw:cOraka,75221:cOrikA:t:
25a mw:cOrakayA,75222:cOrikayA:t: (instrumental of cOrikA)

funderburkjim commented 8 years ago

Cases 26 - 32

26 mw:Candaska,75624:Candaska:n: (IFC)
27 mw:Cinna,75979:CinnA:t: (CF) \ discussed above**
28 mw:jaNgama,76400:jaNgama:n: (IFC)

29-32 were discussed above

29 mw:janaSruta,76813:janaSruta:t: f.->m. (CF)
Changed from above discussion
- (76811) (H3) jana-śruta [p= 410] : m.
- (76812) (H3B) jana-śrutā : f.
- (76813) (H3B) jana-śruta : f. cf. jānaśruti.
- jānaśruti [p= 418] : m. patr. fr. jana-śruta Hence, 76831 refers to 76811 jana-śruta m.
30 mw:jantuka,77030:jantukA:t:
31 mw:jamBa,77208:jamBa:t: (CF) f -> m.
32 mw:jamBa,77209:jamBa:t: (CF) f -> m.

funderburkjim commented 8 years ago

Cases 33 - 45

33 mw:jAMhAgira,78417:jAMhAgira:t: 18 f. is not feminine, but abbreviation for 'following'
34 mw:jAmbavata,78931:jAmbavatI:t: \ discussed above **
35 mw:jyApiRqa,80647:jyApiRqa:t: f. is not feminine, but abbreviation for 'following'
36 mw:jyApiRqaka,80648:jyApiRqaka:t: f. is not feminine, but abbreviation for 'following'
37 mw:JillIka,81136:JillIkA:t:

38-45 were discussed above

38 mw:qimBa,81435:qimBa:t: (CF) f -> m.
39 mw:tawa,81666:tawa:t: (CF) f -> m.
40 mw:tawa,81667:tawI:t: since cf pura-tawI
41 mw:tapanatanaya,82675:tapanatanayA:t:
42 mw:tapanatanaya,82676:tapanatanayA:t:
43 mw:taraMgaka,83064:taraMgaka:t: (CF) f -> m.
44 mw:tarjana,83436:tarjanI:t:
45 mw:tApa,83891:tApa:t: (CF) f -> m.

funderburkjim commented 8 years ago

Cases 46-50:

46 mw:tAma,83986:tAmI:t:
47 mw:tiktaPala,84788:tiktaPalA:t:
48 mw:tiktaPala,84789:tiktaPalA:t:
49 mw:tilaparRika,85262:tilaparRika:t: (CF) f. -> n.
50 mw:tuRqika,85855:tuRqikA:t:

sanskrit-lexicon / CORRECTIONS

aFem Manual Processing, Part 5 (MW) #174