sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

INM corrections drawn from faultfinder suggestions #95

Closed funderburkjim closed 9 years ago

funderburkjim commented 9 years ago

73 cases were found by faultfinder for the headwords of INM ( Index to the Names in the Mahabharata)

These comprise headwords of INM that

occur only in INM have spelling patterns that don't occur among the MW headwords, and spelling patterns are not of the 'rxx' variety.

The comments are generated by program, and contain links to the Cologne displays and scans for each case. [Note: For cases where the headword is wrong, and after the correction is made, then the link to the Cologne display will likely fail (since it is based on the old, incorrect spelling)].

The cases are listed in page order, which should facilitate examination of the cases.

funderburkjim commented 9 years ago

001 AcAryO -> headword AcAryO --- page 004-1


002 ardracarman -> headword ardracarman --- page 067-1

funderburkjim commented 9 years ago

003 asaYjYa -> headword asaYjYa --- page 092-2


004 OrvopAKyAna -> headword OrvopAKyAna --- page 101-1


005 Otanka -> headword Otanka --- page 101-2


006 bAhudaNtaka -> headword bAhudaNtaka --- page 104-1


007 banDanaasurendrARAM -> headword banDanaasurendrARAM --- page 112-1

funderburkjim commented 9 years ago

008 BfgostIrTaM -> headword BfgostIrTaM --- page 148-2


009 bisastEnyopAKyAna -> headword bisastEnyopAKyAna --- page 153-1


010 cedipungava -> headword cedipungava --- page 175-1


011 Satrujetf -> headword Satrujetf --- page 198-2


012 SiKAprokta -> headword SiKAprokta --- page 202-2

funderburkjim commented 9 years ago

013 Sizwezwa -> headword Sizwezwa --- page 203-1


014 SyAmAyAASrama -> headword SyAmAyAASrama --- page 223-2


015 devAsuravinirmAwr -> headword devAsuravinirmAwr --- page 239-1


016 DArzWadyumna -> headword DArzWadyumna --- page 243-2


017 drasWAtman -> headword drasWAtman --- page 256-1

funderburkjim commented 9 years ago

018 etAvarRO -> headword etAvarRO --- page 285-2


019 gaNgAyamunayostirTam -> headword gaNgAyamunayostirTam --- page 299-2


020 gocfNga -> headword gocfNga --- page 309-1


021 goptrAtman -> headword goptrAtman --- page 312-1


022 hanUmadBImasamvAda -> headword hanUmadBImasamvAda --- page 317-2

funderburkjim commented 9 years ago

023 jyezTA -> headword jyezTA --- page 364-2


024 kAmakroDO -> headword kAmakroDO --- page 378-2


025 kOberya -> headword kOberya --- page 398-2


026 KaqgotpattikaTana -> headword KaqgotpattikaTana --- page 404-2


027 kIrtyAvAsa -> headword kIrtyAvAsa --- page 409-1

funderburkjim commented 9 years ago

028 koRvaSira -> headword koRvaSira --- page 410-1


029 ksetrajYa -> headword ksetrajYa --- page 430-1


030 kUSm -> headword kUSm --- page 433-1


031 kunqala -> headword kunqala --- page 435-1


032 lohitOdaDi -> headword lohitOdaDi --- page 446-2

funderburkjim commented 9 years ago

033 mftyuprajApatisamvAda -> headword mftyuprajApatisamvAda --- page 490-1


034 mUrtO -> headword mUrtO --- page 491-2


035 nAgavatmAn -> headword nAgavatmAn --- page 494-1


036 napumsaka -> headword napumsaka --- page 504-1


037 pAScimAnUpaka -> headword pAScimAnUpaka --- page 522-2

funderburkjim commented 9 years ago

038 padaanuttama -> headword padaanuttama --- page 523-1


039 pARqoqrAjO -> headword pARqoqrAjO --- page 534-1


040 pARdu -> headword pARdu --- page 535-1


041 parvasaNgraha -> headword parvasaNgraha --- page 543-2


042 parvasaNgrahaparvan -> headword parvasaNgrahaparvan --- page 543-2

funderburkjim commented 9 years ago

043 pativratAyAAKyAna -> headword pativratAyAAKyAna --- page 545-2


044 piNgAyAASrama -> headword piNgAyAASrama --- page 551-1


045 pitAmahasurta -> headword pitAmahasurta --- page 551-2


046 pUzRodantaBid -> headword pUzRodantaBid --- page 572-1


047 pUzRodantavinASa -> headword pUzRodantavinASa --- page 572-1

funderburkjim commented 9 years ago

048 pUzRodantavinASana -> headword pUzRodantavinASana --- page 572-1


049 rAmAyaROpAKyAna -> headword rAmAyaROpAKyAna --- page 592-1


050 raRezvagnimuKa -> headword raRezvagnimuKa --- page 593-2


051 rtA -> headword rtA --- page 602-1


052 rtasyakartf -> headword rtasyakartf --- page 602-1

funderburkjim commented 9 years ago

053 samskfti -> headword samskfti --- page 613-1


054 saNgrahADyAya -> headword saNgrahADyAya --- page 615-2


055 saNgrAmajit -> headword saNgrAmajit --- page 615-2


056 saRkarzaRa -> headword saRkarzaRa --- page 619-1


057 sANKyAtman -> headword sANKyAtman --- page 619-2

funderburkjim commented 9 years ago

058 saNkzeptf -> headword saNkzeptf --- page 619-2


059 sannyastapAda -> headword sannyastapAda --- page 620-1


060 saptEDas -> headword saptEDas --- page 620-1


061 sarasvatItArkzyasamvAda -> headword sarasvatItArkzyasamvAda --- page 623-1


062 sraztf -> headword sraztf --- page 648-2

funderburkjim commented 9 years ago

063 suCattra -> headword suCattra --- page 652-2


064 surAriGRa -> headword surAriGRa --- page 661-2


065 susaNkzepa -> headword susaNkzepa --- page 665-1


066 svarAztra -> headword svarAztra --- page 669-1


067 tAmrOzWa -> headword tAmrOzWa --- page 673-2

funderburkjim commented 9 years ago

068 trElokyeSa -> headword trElokyeSa --- page 679-1


069 triBuvanaSfezWa -> headword triBuvanaSfezWa --- page 679-2


070 uttarAazAQAH -> headword uttarAazAQAH --- page 698-1


071 vallyaH -> headword vallyaH --- page 706-2


072 vicitravIryoparama -> headword vicitravIryoparama --- page 727-1

funderburkjim commented 9 years ago

073 vudvudA -> headword vudvudA --- page 757-2

funderburkjim commented 9 years ago

@drdhaval2785
The extraction of INM records from AllvsMW.txt is here

An attempt was made to generate fuzzy suggestions.

This inm-fuzzyalpha.txt file has 3 lines for each of the cases shown above:

Here is the workflow I have used:

Let me know if the approach I've set up for you with INM is good. If so, I'll do similar things for the other dictionaries that remain to be examined via faultfinder.

drdhaval2785 commented 9 years ago

Is the colon necessary ? or only in case we want to write some 'notes' ? e.g. Should I write

001 AcAryatanaya
001 AcAryO ->  NO CHANGE :
001 acintya

or

001 AcAryatanaya
001 AcAryO ->  NO CHANGE
001 acintya
drdhaval2785 commented 9 years ago
Step 2: Once fuzzyalpha is finished, go back and copy/paste answers into comments in this Github page

Can't we automatize this work also ? e.g. inm-fuzzyalpha.txt may have a fourth line which would have the following detail

001 AcAryatanaya
001 AcAryO ->  
001 acintya
headword <a target='_INMword' href='http://www.sanskrit-lexicon.uni-koeln.de/scans/INMScan/2013/web/webtc/indexcaller.php?input=slp1&output=deva&key=AcAryO'>AcAryO</a> ---  page <a target='_INMpage' href='http://www.sanskrit-lexicon.uni-koeln.de/scans/INMScan/2013/web/webtc/servepdf.php?page=004'>004-1</a>
<hr/>

In this situation, after correcting inm-fuzzyalpha.txt we can directly post it on Github. e.g. 001 AcAryatanaya 001 AcAryO ->
001 acintya headword AcAryO --- page 004-1


What say?

drdhaval2785 commented 9 years ago

This way we would save some 10-15 minutes of copy paste, and also some manual error which might creep in while doing copy-paste like pasting at wrong place etc.

gasyoun commented 9 years ago

15 minutes is not a small amount, if we want to spend only an hour per day, I agree. Sometimes this word may also be spelled wrong! - so maybe let's have 3-5 words?

funderburkjim commented 9 years ago

Re is the colon necessary - Short answer is NO - should be ok to omit colon, as long as nothing else is on the line. A correction line in fuzzyalpha is recognized as one containing '->'.
Suppose nnn hw -> newhw : some comment or other is such a line of fuzzyalpha. It is parsed first by splitting the line by colon character, and taking the first piece (e.g. the first piece is nnn hw -> newhw and then further parsing etc done from there. If there is no colon, this logic still works. So, that's a longer explanation of why colon is not necessary.

funderburkjim commented 9 years ago

re after correcting inm-fuzzyalpha.txt we can directly post it on Github.

I don't know how to do this with the Github api. If you don't want to do this step, I'll do it.

I'm not sure why you want a 4th line in fuzzyalpha.

When I said 'copy and paste from fuzzyalpha to github' - All I mean is to copy/paste the answer. Typically this is one word for case. So, if case 001 for inm has answer 'NO CHANGE' you would edit the Github comment for case 001, copy 'NO CHANGE' from fuzzyalpha , and paste after the '-> ' in Github comment for case 001. If you have comments (after the ':' in fuzzyalpha)) you can copy the answer and comments with one step.

drdhaval2785 commented 9 years ago

@funderburkjim If you can show how you bring these page numbers, I would be able to show you what I mean. I am not talking about Github API. I just want to have some code in place by which we can prepare a text which is ready to be pasted on Github. Once we have already laboured on .txt file. Writing the same on Github again is not fun, even if it is one word only per case.

gasyoun commented 9 years ago

I agree with Dhaval.

funderburkjim commented 9 years ago

Both inm-fuzzyalpha.txt and the github comments are created from the same input file, inm-only-notrxx-page.txt.

Here is the fragment of githubpost.py that constructs the links for headword and page: Is this what you need?

def page_link(pagecol):
 """ return 'href' string for link to scanned image for INM for page 'page'
 """
 d = "INM" 
 y = "2013"
 base = "http://www.sanskrit-lexicon.uni-koeln.de/scans"
 url = "%s/%sScan/%s/web/webtc/servepdf.php" %(base,d,y)
 (page,col) = re.split('-',pagecol) # for inm
 pageparm = page
 parms = "page=%s" % pageparm
 href = "%s?%s" % (url,parms) 
 ans = "<a target='_INMpage' href='%s'>%s</a>" %(href,pagecol)
 return ans

def headword_link(hw):
 """ return 'href' string for link to basic  display for pwg for headword hw
     Use this form, which GitHub accepts, so that link opens in same
     tab always
 """
 d = "INM" 
 y = "2013"
 base = "http://www.sanskrit-lexicon.uni-koeln.de/scans"
 url = "%s/%sScan/%s/web/webtc/indexcaller.php" %(base,d,y)
 parms = "input=slp1&output=deva&key=%s" % hw
 href = "%s?%s" % (url,parms) 
 ans = "<a target='_INMword' href='%s'>%s</a>" %(href,hw)
 return ans
drdhaval2785 commented 9 years ago
001 AcAryatanaya
001 AcAryO ->  
001 acintya
headword <a target='_INMword' href='http://www.sanskrit-lexicon.uni-koeln.de/scans/INMScan/2013/web/webtc/indexcaller.php?input=slp1&output=deva&key=AcAryO'>AcAryO</a> ---  page <a target='_INMpage' href='http://www.sanskrit-lexicon.uni-koeln.de/scans/INMScan/2013/web/webtc/servepdf.php?page=004'>004-1</a>
------------------------------------------------------------------------

This is what I want in inm_fuzzyalpha.txt

funderburkjim commented 9 years ago

Here is sample from modified inm-fuzzyalpha.txt.

001 AcAryatanaya 001 AcAryO -> (AcArya,AcAryA,AcAryI) 001 acintya 001 headword AcAryO --- page 004-1


002 ArdrA 002 ardracarman -> (akrUrakarman,aGorakarman,aNkacarman,acCacarman,adArakarman,aDarakarman,anyacarman,aparakarman,arTakarman,arTavarman,astrakarman) 002 arGABiharaRa 002 headword ardracarman --- page 067-1


003 asaNga 003 asaYjYa -> (asajYa,asaMjYa,asaYjYA) 003 asaNKyeya 003 headword asaYjYa --- page 092-2

gasyoun commented 9 years ago

(akrUrakarman,aGorakarman,aNkacarman,acCacarman,adArakarman,aDarakarman,anyacarman,aparakarman,arTakarman,arTavarman,astrakarman) - add a space after ,?

funderburkjim commented 9 years ago

Will add space after commas for next iteration for next dict.

drdhaval2785 commented 9 years ago

Corrections are as below


002 ArdrA 002 ardracarman -> Ardracarman : Print error 002 arGABiharaRa 002 headword ardracarman --- page 067-1


005 Ozija 005 Otanka -> OtaNka : Print error 005 OttAnapAda 005 headword Otanka --- page 101-2


006 bahudAmA 006 bAhudaNtaka -> bAhudantaka : Smudge 006 bahuDAnindita 006 headword bAhudaNtaka --- page 104-1


007 banDana 007 banDanaasurendrARAM -> banDana asurendrARAM : Space in between 007 bAnDava 007 headword banDanaasurendrARAM --- page 112-1


010 cedipati 010 cedipungava -> cedipuNgava 010 cedirAj 010 headword cedipungava --- page 175-1


014 SyAmA 014 SyAmAyAASrama -> SyAmAyA ASrama : Space 014 SyAmAyana 014 headword SyAmAyAASrama --- page 223-2


015 devAsuravaraprada 015 devAsuravinirmAwr -> devAsuravinirmAtf : Print error 015 devAsureSvara 015 headword devAsuravinirmAwr --- page 239-1


016 DarzaRAtman 016 DArzWadyumna -> DArzwadyumna : Print error 016 DArtarAzwra 016 headword DArzWadyumna --- page 243-2


017 divyAtman 017 drasWAtman -> drazwAtman : Not sure. Needs further investigation. 017 drORaputra 017 headword drasWAtman --- page 256-1


019 gaNgAvataraRa 019 gaNgAyamunayostirTam -> gaNgAyamunayostIrTam : Capital I 019 gANgeya 019 headword gaNgAyamunayostirTam --- page 299-2


020 goSabdAtmaja 020 gocfNga -> goSfNga : Also sorted wrong in dictionary 020 godAvarI 020 headword gocfNga --- page 309-1


021 goptf 021 goptrAtman -> (gaBIrAtman) 021 goputra 021 headword goptrAtman --- page 312-1


023 jyezWa 023 jyezTA -> jyezWA 023 jyezWapuzkara 023 headword jyezTA --- page 364-2


024 kAmAKya 024 kAmakroDO -> (kAmakroDa) 024 kAmakft 024 headword kAmakroDO --- page 378-2


029 kzemya 029 ksetrajYa -> kzetrajYa : Print error 029 kzetrAtman 029 headword ksetrajYa --- page 430-1


031 kuRqala 031 kunqala -> kuRqala 031 kuRqalA 031 headword kunqala --- page 435-1


032 lohita 032 lohitOdaDi -> lohita udaDi : Space 032 lohitAkza 032 headword lohitOdaDi --- page 446-2


034 murmurA 034 mUrtO -> mUrtO hi te ... sarve vE devatAH : A sentence 034 mUrtiSAstra 034 headword mUrtO --- page 491-2


035 nagAtmajA 035 nAgavatmAn -> nAgavatman 035 nAgendra 035 headword nAgavatmAn --- page 494-1


036 naptf 036 napumsaka -> napuMsaka 036 nara 036 headword napumsaka --- page 504-1


038 pAda 038 padaanuttama -> pada anuttama 038 pAdANga 038 headword padaanuttama --- page 523-1


039 paRqitaka 039 pARqoqrAjO -> pARqoqrarAjO 039 pARqu 039 headword pARqoqrAjO --- page 534-1


040 pARqu 040 pARdu -> pARqu 040 pARqu 040 headword pARdu --- page 535-1


043 pativratAmAhAtmyaparvan 043 pativratAyAAKyAna -> pativratAyA AAKyAna : Space 043 pativratAyAmAhAtmyasAvitryAH 043 headword pativratAyAAKyAna --- page 545-2


044 piNgatIrTa 044 piNgAyAASrama -> piNgAyA ASrama 044 piNgeSa 044 headword piNgAyAASrama --- page 551-1


045 pitAmahasyasaras 045 pitAmahasurta -> pitAmahasuta 045 pitAmahasuta 045 headword pitAmahasurta --- page 551-2


046 puzkariRI 046 pUzRodantaBid -> pUzRo dantaBid : Space 046 pUzRodantavinASa 046 headword pUzRodantaBid --- page 572-1


047 pUzRodantaBid 047 pUzRodantavinASa -> pUzRo dantavinASa : Space 047 pUzRodantavinASana 047 headword pUzRodantavinASa --- page 572-1


048 pUzRodantavinASa 048 pUzRodantavinASana -> pUzRo dantavinASana : Space 048 puzpa 048 headword pUzRodantavinASana --- page 572-1


049 rAmAyaRa 049 rAmAyaROpAKyAna -> rAmAyaRa upAKyAna : Space 049 ramBA 049 headword rAmAyaROpAKyAna --- page 592-1


051 fta 051 rtA -> ftA : Print error 051 ftaDAman 051 headword rtA --- page 602-1


052 ftaDAman 052 rtasyakartf -> ftasya kartf 052 ftavaHzat 052 headword rtasyakartf --- page 602-1


053 saMsaptaka 053 samskfti -> saMskfti 053 saMsTAna 053 headword samskfti --- page 613-1


056 sANkASya 056 saRkarzaRa -> saNkarzaRa : Print error 056 saNkarzaRAnuja 056 headword saRkarzaRa --- page 619-1


062 sraja 062 sraztf -> srazwf 062 sruvahasta 062 headword sraztf --- page 648-2


064 surAri 064 surAriGRa -> surAriGna 064 surArihan 064 headword surAriGRa --- page 661-2


066 svarAj 066 svarAztra -> svarAzwra 066 svaravyaYjanaBUzaRa 066 headword svarAztra --- page 669-1


069 tretAyuga 069 triBuvanaSfezWa -> triBuvanaSrezWa 069 triBuvanaviBu 069 headword triBuvanaSfezWa --- page 679-2


070 uttarAgni 070 uttarAazAQAH -> uttarA AzAQAH 070 uttarAHkuravaH 070 headword uttarAazAQAH --- page 698-1


drdhaval2785 commented 9 years ago

The correct entries are as below

001 AcAryatanaya 001 AcAryO -> NO CHANGE 001 acintya 001 headword AcAryO --- page 004-1

003 asaNga 003 asaYjYa -> NO CHANGE 003 asaNKyeya 003 headword asaYjYa --- page 092-2

004 OrvaAKyAna 004 OrvopAKyAna -> NO CHANGE 004 OzadaSvi 004 headword OrvopAKyAna --- page 101-1

008 Bramara 008 BfgostIrTaM -> NO CHANGE 008 Bfgu 008 headword BfgostIrTaM --- page 148-2

009 bindusaras 009 bisastEnyopAKyAna -> NO CHANGE 009 boDa 009 headword bisastEnyopAKyAna --- page 153-1

011 Satruhan 011 Satrujetf -> NO CHANGE 011 Satrujit 011 headword Satrujetf --- page 198-2

012 SiKanditanaya 012 SiKAprokta -> NO CHANGE 012 SiKAvarta 012 headword SiKAprokta --- page 202-2

013 Sizwakft 013 Sizwezwa -> NO CHANGE 013 Sita 013 headword Sizwezwa --- page 203-1

018 eraka 018 etAvarRO -> NO CHANGE 018 gabDakAlI 018 headword etAvarRO --- page 285-2

022 haMsikA 022 hanUmadBImasamvAda -> NO CHANGE : The next entry shows hanUmat as alternative for hanumat. 022 hanumat 022 headword hanUmadBImasamvAda --- page 317-2

025 kObera 025 kOberya -> NO CHANGE 025 kOSala 025 headword kOberya --- page 398-2

026 Kaqgin 026 KaqgotpattikaTana -> NO CHANGE 026 Kaga 026 headword KaqgotpattikaTana --- page 404-2

027 kIrtivarman 027 kIrtyAvAsa -> NO CHANGE 027 kizkinDA 027 headword kIrtyAvAsa --- page 409-1

028 koNkana 028 koRvaSira -> NO CHANGE 028 kopavega 028 headword koRvaSira --- page 410-1

030 kuSikottama 030 kUSm -> NO CHANGE 030 kuhana 030 headword kUSm --- page 433-1

033 mftyupA 033 mftyuprajApatisamvAda -> NO CHANGE 033 mucukunda 033 headword mftyuprajApatisamvAda --- page 490-1

037 pASASinI 037 pAScimAnUpaka -> NO CHANGE : The explanation compares it with paScimAnUpaka. So maybe an intentional reading. 037 pASin 037 headword pAScimAnUpaka --- page 522-2

041 parvAnukramaRI 041 parvasaNgraha -> NO CHANGE 041 parvasaNgrahaparvan 041 headword parvasaNgraha --- page 543-2

042 parvasaNgraha 042 parvasaNgrahaparvan -> NO CHANGE 042 parvata 042 headword parvasaNgrahaparvan --- page 543-2

050 raRapriya 050 raRezvagnimuKa -> NO CHANGE 050 raRotkawa 050 headword raRezvagnimuKa --- page 593-2

054 saNgraha 054 saNgrahADyAya -> NO CHANGE 054 saNgrAmajit 054 headword saNgrahADyAya --- page 615-2

055 saNgrahADyAya 055 saNgrAmajit -> NO CHANGE 055 saNgrAmajit 055 headword saNgrAmajit --- page 615-2

055 saNgrAmajit 055 saNgrAmajit -> NO CHANGE 055 sanIya 055 headword saNgrAmajit --- page 615-2

057 sANKyayogapravartin 057 sANKyAtman -> NO CHANGE 057 sANKyarzi 057 headword sANKyAtman --- page 619-2

058 sANkftya 058 saNkzeptf -> NO CHANGE 058 sannateyu 058 headword saNkzeptf --- page 619-2

059 sannivAsa 059 sannyastapAda -> NO CHANGE 059 santa 059 headword sannyastapAda --- page 620-1

060 saptagodAvara 060 saptEDas -> NO CHANGE 060 saptakft 060 headword saptEDas --- page 620-1

061 sarasvatI 061 sarasvatItArkzyasamvAda -> NO CHANGE 061 sArasvatya 061 headword sarasvatItArkzyasamvAda --- page 623-1

063 sucetas 063 suCattra -> NO CHANGE 063 sUcI 063 headword suCattra --- page 652-2

065 susAman 065 susaNkzepa -> NO CHANGE 065 susaNkula 065 headword susaNkzepa --- page 665-1

067 tAmrAruRA 067 tAmrOzWa -> NO CHANGE 067 tAmravatI 067 headword tAmrOzWa --- page 673-2

068 trElokyarAja 068 trElokyeSa -> NO CHANGE 068 trEpura 068 headword trElokyeSa --- page 679-1

071 vallaBa 071 vallyaH -> NO CHANGE 071 vAlmIka 071 headword vallyaH --- page 706-2

072 vicitravIryasutotpatti 072 vicitravIryoparama -> NO CHANGE 072 viSAKa 072 headword vicitravIryoparama --- page 727-1

073 vfttAvfttakara 073 vudvudA -> NO CHANGE 073 vyaSva 073 headword vudvudA --- page 757-2

drdhaval2785 commented 9 years ago

https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/fuzzyalpha/github_issues_post.php is the program

https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/fuzzyalpha/inm-fuzzyalpha.txt is the corrected file

gasyoun commented 9 years ago

Amazing productivity.

funderburkjim commented 9 years ago

In processing the changes (from your fuzzyalpha file), some questions arose on several of the cases.
These cases can be divided into a few groups, here's the first smaller group.

mUrtO -> mUrtO  -   did you intend this to be NO CHANGE?
65786 old <HI>{@Mu1rtau hi te…sarve vai devata1h2@}¦ = Çiva
;  DArzWadyumna -> DArzwadyumna  
34696 old <HI>{@Dha1rsht2hadyumna@}¦ or {@Dha1rsht2adyumni@} (“son of
The author clearly gives both forms  Wa and wa, the wa form appears in several dictionaries.
I don't think there is a typo, but could make the hw1.py program generate the 'key1' headwords 
to be the 'wa' form.
This one looks correct as goptrAtman , so I think it should be NO CHANGE
;  goptrAtman -> (gaBIrAtman) 
;43273 old <HI>{@Goptra1tman@}¦ = Kr2shn2a: XII, 1659.
funderburkjim commented 9 years ago

The cases of second and third groups of questions involve the issue of 'headwords' in INM which appear as phrases (multiple words) in the text. The second group also has spelling errors that need correcting. I'm sure the spelling errors should be corrected, but am not sure what the 'key1' headword should be.

; 04/15/2015 gaNgAyamunayostirTam -> gaNgAyamunayostIrTam   scan err
41685 old <HI>{@Gan3ga1-Yamunayos tirtham,@}¦ a ti1rtha. § 733{%p%} (A1nuça1-
41685 new <HI>{@Gan3ga1-Yamunayos ti1rtham,@}¦ a ti1rtha. § 733{%p%} (A1nuça1-
;
; 04/15/2015  rtasyakartf -> ftasya 
79592 old <HI>{@Rtasya kartr2@}¦ = Skanda: III, 14644.
79592 new <HI>{@R2tasya kartr2@}¦ = Skanda: III, 14644.
funderburkjim commented 9 years ago

In the third group, there is no spelling error, but the text has a headword phrase.

; 04/15/2015  banDanaasurendrARAM -> banDana
18206 old <HI>{@Bandhana(h2) Asurendra1n2a1m2@}¦ = Çiva (1000 names^2).

; 04/15/2015  SyAmAyAASrama -> SyAmAyA 
32180 old <HI>{@Çya1ma1ya1 a1çrama(h2),@}¦ a ti1rtha. § 733{%m%} (Citraku1t2a):

; 04/15/2015  lohitOdaDi -> lohita 
60110 old <HI>{@Lohita(h2) udadhi(h2)@}¦ (“the bloody ocean”). § 498

; 04/15/2015  padaanuttama -> pada 
69736 old <HI>{@Pada(m) anuttama(m2)@}¦ = Vishn2u (1000 names).

; 04/15/2015  pativratAyAAKyAna -> pativratAyA 
72542 old <HI>{@Pativrata1ya1 a1khya1na(m2)@}¦ (“the story of the faithful

; 04/15/2015  piNgAyAASrama -> piNgAyA 
73259 old <HI>{@Pin3ga1ya1(h2) a1çrama(h2),@}¦ a ti1rtha. § 733{%y%} (Ujja1naka):

; 04/15/2015  pUzRodantaBid -> pUzRo 
75847 old <HI>{@Pu1shn2o dantabhid@}¦ (“destroyer of Pu1shan's teeth”)

; 04/15/2015  pUzRodantavinASa -> pUzRo  
75849 old <HI>{@Pu1shn2o dantavina1ça(h2)@}¦ (do.) = Çiva: VII, 9541.

; 04/15/2015  pUzRodantavinASana -> pUzRo  
75850 old <HI>{@Pu1shn2o dantavina1çana(h2)@}¦ (do.) = Çiva: XII, 10423

; 04/15/2015  rAmAyaROpAKyAna -> rAmAyaRa 
78355 old <HI>{@Ra1ma1yan2a(m) upa1khya1na(m2)@}¦ (“the episode relating

; 04/15/2015  uttarAazAQAH -> uttarA CHECK 
91669 old <HI>{@Uttara1(h2) Asha1d2ha1h2,@}¦ v. Asha1d2ha1.

I agree that the current key1 headword is wrong in these cases, but am not sure that the corrections are what we should use for key1.

As a side note, there are 385 cases in INM where the headword from the text is given as two or more words.

Currently, the headword generation step (hw1.py) derives key1 in these cases as follows:

The result is key1.

Here are some examples (these are in Anglicized Sanskrit coding - conversion to SLP1 is separate step):

Abhimanyor bha1rya1   --> abhimanyorbha1rya1
Açvamedhika(m2) parva(n)  --> as4vamedhikaparva
Aparimita, Aparinirmita, Aparinindita  --> aparimita

So I raise these questions: 1a. Should key1 for the cases banDanaasurendrARAM , etc., from this faultfinder study be changed per Dhaval's original suggestion? (Dhaval's principle is to use to first word, I think.) 1b. Should the same be done for the two in the first group (of previous comment) 2a. Should we attend now to the rest of the 385 cases of phrase headwords ? 2b. If so, should we change the key1 algorithm to use the 'first word' (ignoring parentheses) in these cases?

gasyoun commented 9 years ago

1a first word, if not several homonyms present, is a good approach. 1b I guess so. 2a I guess so, because they are headwords and we are deep enough in headword woods. Now is the right time. 2b I guess so.

But there is one BIG issue left. SLP1 makes no way to differentiate capital letters in the beginning of the word. That is a MAJOR issue. And you do know I use capital letters in words rarely, so I use them only to show how much I worry about it. What can we do, @funderburkjim because that means that we are getting far away from the book without no need at all, actually. Adding some additional markup, so we can have best of two worlds - SLP1 and capital words in the list view as capital words?

drdhaval2785 commented 9 years ago

@funderburkjim This is the first time I am working with text files for corrections rather than HTML. So yes, some of them are my errors.

Let me respond to your queries.

034 mUrtO -> mUrtO hi te ... sarve vE devatAH : A sentence

I didn't intend it to be NO CHANGE. The dictionary has a whole sentence as headword, rather than a single word or phrase. See capture

;  DArzWadyumna -> DArzwadyumna  
34696 old <HI>{@Dha1rsht2hadyumna@}¦ or {@Dha1rsht2adyumni@} (“son of
The author clearly gives both forms  Wa and wa, the wa form appears in several dictionaries.
I don't think there is a typo, but could make the hw1.py program generate the 'key1' headwords 
to be the 'wa' form.

It is a print error which has to be corrected. The author has clubbed both Wa and wa version. Otherwise in the explanation he should have given wa version. See capture

This one looks correct as goptrAtman , so I think it should be NO CHANGE
;  goptrAtman -> (gaBIrAtman) 
;43273 old <HI>{@Goptra1tman@}¦ = Kr2shn2a: XII, 1659.

Yes. My error. It is NO CHANGE

drdhaval2785 commented 9 years ago
1a. Should key1 for the cases banDanaasurendrARAM , etc., from this faultfinder study be changed per
Dhaval's original suggestion? (Dhaval's principle is to use to first word, I think.)

NO. My suggestion is to use the whole phrase as it is in the dictionary. First word only doesn't make any sense. e.g. my suggestion is piNgAyAASrama -> piNgAyA ASrama i.e. to incorporate the whole phrase (both words) as a separate headword.

As the answer to 1a is in negative, the questions 1b,2a and 2b don't survive.

drdhaval2785 commented 9 years ago

Let's correct the errors and close this issue.

Headword phrases / headword sentences itself is a huge issue, fit to be tackled separately at https://github.com/sanskrit-lexicon/CORRECTIONS/issues/97

funderburkjim commented 9 years ago

Good idea to separate headword phrase question to separate issue.

Errors corrected.

Closing issue.