sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

aFem Manual Processing, Part 5 (MW) #174

Closed gasyoun closed 8 years ago

gasyoun commented 8 years ago

gona

gauda

ghutika

gasyoun commented 8 years ago

@drdhaval2785 goṇa [L=67804] -> goṇī automatically generated for the HTML list would help. Otherwise the HTML is useless - it's not sorted by dictionaries (I check 1 dictionary at the time) and it has no help of chunks for copypaste like goṇa [L=67804] -> goṇī. Even if it's goṇa [L=67804] -> goṇā it's an easier fix than retyping it all. And more - it can be in SLP1 here. For humans IAST is good. For PC - SLP1. If I copypaste from the dictionary, there is dirt like page number, which I delete every time. But guess the task is too small to optimize it.

gasyoun commented 8 years ago

ghur

[p= 400,1] -> [p= 399,3] [L=74528](neutrum word starts on previous page than stated)

cukra cukra2

cukraka

funderburkjim commented 8 years ago

@gasyoun Are there going to be more items under this issue for MW? If not, I'll install, otherwise, I'll wait.

gasyoun commented 8 years ago

Midway upon the journey of our life, I found myself within a forest dark, For the straight foreward pathway had been lost. So I continue.

funderburkjim commented 8 years ago

@gasyoun My interpretation of your poem is that this is NOT ready for installation yet.

gasyoun commented 8 years ago

mw:cauraka,75221:caurikA:t:

cauraka

gasyoun commented 8 years ago

mw:chinna,75979:chinnA:t:

chinna

gasyoun commented 8 years ago

mw:janaSruta,76811:janaSrutA:t: 76811 76813 (ejf)

janasruta

gasyoun commented 8 years ago

mw:jantuka,77030:jantukA:t: jantuka

82 more lines left. Slow it is.

drdhaval2785 commented 8 years ago

Shouldnt be cOrikA? On 19 Dec 2015 03:03, "Marcis Gasuns" notifications@github.com wrote:

mw:cauraka,75221:caurikA:t:

[image: cauraka] https://cloud.githubusercontent.com/assets/80761/11907806/182aa926-a5e8-11e5-8244-363b33382a56.PNG

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/174#issuecomment-165902319 .

drdhaval2785 commented 8 years ago

CinnA? On 19 Dec 2015 03:05, "Marcis Gasuns" notifications@github.com wrote:

mw:chinna,75979:chinnA:t:

[image: chinna] https://cloud.githubusercontent.com/assets/80761/11907851/5d86c9d2-a5e8-11e5-8947-f5eda34e68b6.PNG

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/174#issuecomment-165902641 .

gasyoun commented 8 years ago

I do not think it's f., but it's an cf..

jambha

Sarcophagidae commented 8 years ago

Let's move [L=78931] f. = °bavī L. (=jāmbavatabavī) under jāmbavatī.

jamba

evsyukov commented 8 years ago

ḍimbha [L=81435] f. cf. toya-. -> ḍimbha [L=81435] cf. toya-.

qimba

Sarcophagidae commented 8 years ago

[L=81666] f. cf. a-, ut- -> cf. a-, ut- [L=81667] f. pura-taṭī. -> cf. pura-taṭī. tawa

evsyukov commented 8 years ago

Mb it'd be better move tapana--tanaya [L=82675] f. = °pantī W. tapana--tanaya [L=82676] f. = °pasvī*ṣṭā L.

TO

tapana--tanayā [L=82674] f. = -sutā L.

tapanatanaya

evsyukov commented 8 years ago

taraṁgaka [L=83064] f. cf. nārī-. -> taraṁgaka [L=83064] cf. nārī-.

taramgaka

gasyoun commented 8 years ago

Maybe taraṁgikā [L=83064] cf. nārī-., @drdhaval2785 ?

drdhaval2785 commented 8 years ago

taraNgikA is the correct word. And I welcome new friends.

Sarcophagidae commented 8 years ago

I'm glad to meet you too)

Lets move [L=264297] f. = svarṇakṣīrī L. under hemāhvā .

I think [L=264297] is a part of hemāhvā [L=264296] article.

hemahva

Sarcophagidae commented 8 years ago

Lets move [L=260121] f. w.r. for haṁsa-padā under haṁsapādā. I think [L=260121] is a part of haṁsapādā [L=260120] article.

hamsapada

evsyukov commented 8 years ago

Nice to meet you ) MB, tarjana [L=83436] f. = °nikā Hcat. ii, 1. is the part of tarjanī [L=83435] f. " threatening finger ", the fore-finger Kathās. xvii, 88 KātyṠr. Sch. ?

tarjana

gasyoun commented 8 years ago

@evsyukov tarjanA and tarjanI are already there as real entries. What is the correction?

evsyukov commented 8 years ago

MB, tāpa [L=83891] f. cf. paścāt-. is the part of tāpī a [L=83890] f. the Tapti river (" also the yamunā river " L. ) Hariv. ii, 109, 30 BhP. v, 19, 18 ; x, 79, 20 ? tapa

Sarcophagidae commented 8 years ago

Lets move L=256618 under L=256617 I think [L=256618] f. (a word of unknown meaning) Hariv. 10243. is the part of sparśā [L=256617] f. an unchaste woman L. article sparsa

funderburkjim commented 8 years ago

Re goṇa [L=67803] -> goṇā

This L corresponds to (ifc. after numerals °णि) Pāṇ. 1-2, 50 Kāṡ.

Since the preceding and following parts are feminines ending in 'I', this one can't end in 'A'. I think it is saying that when goRI is used at the end of adjective compounds then it is spelled 'goRi', such as daSagoRi and paYcagoRi. PWG under 'goRI' makes the same point, as does the Panini reference. So, probably the best thing is to make the headword 'goRI',

funderburkjim commented 8 years ago

cukrā* mla [p= 400,1] [L=74528] -> cukrAmlA

74528 is . vinegar made of the Garcinia fruit L., so 74528 is ok The wrong ones are

funderburkjim commented 8 years ago

mw:cauraka,75221:caurikA:t:

Agree, and also,

mw:cOrakayA,75222:cOrikayA:t: since instrumental of preceding cOrikA image

funderburkjim commented 8 years ago

I do not think it's f., but it's an cf.. (under jamBa)

Agree. there are headwords kujamBa, etc. in MW.

Here, right after a jamBA and a jamBI headword, dictionary seems to be talking about compounds ending in jamBa. I'll make the change to say <lex>m.</lex>' rather thanf.`

funderburkjim commented 8 years ago

(=jāmbavatabavī) (under jAmbavata)

Hi, @Sarcophagidae Welcome!

Here's the standard form change --- I think it agrees with your suggestion:

mw:jAmbavata,78931:jAmbavatI:t:

A secondary point. I was confused by jāmbavatabavī I think that = °bavī instead implies jAmbavI (which is a headword).

funderburkjim commented 8 years ago

ḍimbha [L=81435] f. cf. toya-. -> ḍimbha [L=81435] cf. toya-.

@evsyukov Greetings. Thanks for contributing!

At first, I thought you meant -> ḍimbhā.

But there is a headword 'toyaqimBa' (short 'a'); so this seems like the jamBa (77208) example, where the text is reverting to the initial headword in pointing out a compound. So the error is the f. gender. Agree?

funderburkjim commented 8 years ago

[L=81666] f. cf. a-, ut- -> cf. a-, ut- [L=81667] f. pura-taṭī. -> cf. pura-taṭī.

I agree with 81666, since there are headwords atawa and uttawa, so the headword for 81666 should be tawa and the error is in the gender.

However, similar reasoning would lead me to think that, since a feminine pura-tawI is being mentioned, the headword of 81667 should be tawI.
@Sarcophagidae Agree?

gasyoun commented 8 years ago

I was confused by jāmbavatabavī I think that = °bavī instead implies jAmbavI (which is a headword).

@drdhaval2785 I was thinking that °bavī could be jāmbavatabavī just because there is no such headword.

gasyoun commented 8 years ago

@funderburkjim many of the samples point out a pattern of common mistakes. In MW entries first come common, than f. forms, after cf. - words for comparison. So most cf. have f. attribute, but should have none. Agree?

funderburkjim commented 8 years ago

@gasyoun Your suggestion is a good working hypothesis. It may help us understand some of the cases represented in this issue.

gasyoun commented 8 years ago

The case is there are 80 more similar cases. So some batch Jim-side update would not hurt, before we do some foolish work.

funderburkjim commented 8 years ago

Re taraMgaka,83064

Following the reference nArI-X in MW we find nArI-taraMgaka ,m. a libertine, catamite

So, by the logic of the other 'cf' cases, I agree that the correction is to remove the 'f.', as @evsyukov suggests.

funderburkjim commented 8 years ago

@gasyoun Where is the list of words this issue is working from?

funderburkjim commented 8 years ago

@gasyoun tarjana [L=83436] f. = °nikā Hcat. ii, 1. For this one, the headword needs to be corrected to tarjanI, as @evsyukov suggests.

funderburkjim commented 8 years ago

MB, tāpa [L=83891] f. cf. paścāt-.

Since the reference is to paścā́t-tāpa m. , I think the correction is change 'f' to 'm': tāpa [L=83891] m. cf. paścāt-.

drdhaval2785 commented 8 years ago

https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/afem/afem.sh is the code and https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/afem/afem.txt is the word list, Jim.

funderburkjim commented 8 years ago

Thanks, @drdhaval2785 . I'm in midst of trying another approach, list has 161 cases. Am aiming to make a useful extract of MW to simplify the work.

funderburkjim commented 8 years ago

Here's some additional working materials for this study.

For any of us working on this, I suggest we go through in the order of these files, and enter in a comment of this issue, batches of the corrected standard form corrections. We'll have to be careful not to trip over each other if several people are working on this.

Comments?

funderburkjim commented 8 years ago

Cases 1-6:

funderburkjim commented 8 years ago

Cases 7-10.

funderburkjim commented 8 years ago

Cases 11-17

The next 5 were discussed above

funderburkjim commented 8 years ago

Cases 18-25

The next 4 were discussed above

funderburkjim commented 8 years ago

Cases 26 - 32

29-32 were discussed above

funderburkjim commented 8 years ago

Cases 33 - 45

38-45 were discussed above

funderburkjim commented 8 years ago

Cases 46-50: