sanskrit-lexicon / csl-corrections

Replacement for sanskrit-lexicon/CORRECTIONS. User corrections to sanskrit-lexicon/csl-orig
GNU General Public License v3.0
0 stars 0 forks source link

AE English word errors #19

Closed funderburkjim closed 4 years ago

funderburkjim commented 4 years ago

This comment devoted to correction of some English word Errors in Apte English-Sanskrit Dictionary

In #14, a list of potential spelling errors for en is in file ae_error.txt.

ae_error proposes erroneous spellings of 1 or more words occurring in 598 headwords.

ae_error1.txt

Preliminary examination shows that many of the candidate words in ae_error are likely false-positives:

A program was used to weed out these likely false positives. The program reads the AE digitization (ae.txt) and the ae_error.txt file, and constructs a revised ae_error1.txt file; for reference, the words discarded from ae_error are shown in ae_other.txt.

Here are links:

The program files and the script are in this zip archive:

The script is designed to run in a (temporary) subdirectory of csl-orig/v02/ae/.

funderburkjim commented 4 years ago

proposed workflow

Two files need to be downloaded to a local machine:

I anticipate that @sanskritisampada will examine and correct these in the same way that the Macdonell errors were examined and corrected.

gasyoun commented 4 years ago

As usual

8723:refer:Mallinatha

Some are ok

7227:object:dosired

Some are not.

Thanks @sanskritisampada for all this work!

sanskritisampada commented 4 years ago

Happy to do it!

funderburkjim commented 4 years ago

Corrections processed

All done in one batch from Sampada

65:accident:adc:adv
229:aim:detimite:definite
392:annex:Eritish:British
475:applaud:rese:rose
477:apply:yoar:your
;another error <shoulde> corrected to <shoulder>
544:arrest:mountam:mountain
557:as:orinstr:or instr.
571:aspect:eheerful:cheerful
636:attend:maidPage:maid
650:audible:sobi:sob
;I deleted the <i> after <sob> please check  
; ejf:  Agree
677:autonomy:mous:OK
1431:capacity:thec:the c
1492:carte:blanche:OK
1840:cloth:taltered:tattered
1968:comfort:alcep:a c. sleep
2164:construe:refferring:referring
2668:degree:hightest:highest
2698:demand:deside:OK
; deside. is abbrev. for desiderative
2793:despite:preserce:presence
3189:down:prepwith:prep with
3615:equal:daf:dat
3758:exempt:oftby:oft- by
; ejf  changed to oft. by   (Print unclear. Think 'oft.' is abbrev for 'often'
3764:exhaust:feele:feel e. ed
3769:exigence:eant:OK
4056:fit:ili:ill
4120:floor:havaing:having
4133:flourish:Panini:OK
4149:flux:Fluxion:OK
4169:follow:cesid:desid
; ejf desid.   (abbreviation)
4398:game:GamblePage:Gamble
; ejf Gamble- -> Gamble.
4775:hang:lotuslike:lotus-like
4819:have:withgen:with gen.
4835:heart:dv:adv
5005:hostile:constanth:constant
; ejf  constanth -> constant h.  (= constant hostility)
5013:hour:evilh:evil h.
5036:humble:vII:VII
5237:impress:vIII:VIII
;one more correction (R. VI. 66)  
5609:instead:Purushot:OK
5617:instrument:wasi:was i.
5777:radiate:Irradiance:OK
;error in headword, corrected to <Irradiate>  
;ejf GOOD CATCH!
5874:join:jproprietor:j. proprietor
6067:latch:postern:OK
6193:lie:liest:OK
6394:madam:addresing:addressing
6595:meet:hattle:battle
6903:most:lative:OK
;word <superlative> separated by line break
6904:mote:seest:OK
7227:object:dosired:desired
7682:participle:pasPage:pas
7690:pass:throughts:OK
7916:phlegm:phlegmagogue:OK
8248:prepare:leason:lesson
8382:proscribe:Jivasiddhi:OK
8654:reason:somer:some r.
8723:refer:Mallinatha:OK
8805:remove:geneology:OK
8888:result:exby:ex- by
8937:rib:staken:r.s taken
; ejf r. s   (the r. s taken together) = (the ribs taken together)
9268:sear:dahanena:{#dahanena#}
; ejf.  Yes -- Devanagari is coded in ae.txt with {#...#}
9837:stand:obtamed:obtained
10033:suffice:bv:by
; ejf See note below.  After doing this, can work with the last 3.
11286:word:cx:ex
; ejf
11340:yon:pran:pron
; ejf  abbreviation of 'pronominal'
11343:young:hery:her y

ejf:  the downloaded file 'ae.txt'  from Sampada ends with

<L>10655<pc>477<k1>tunnel<k2>tunnel
{@Tunnel,@}¦ {%s.%} {#kulyA, suruM#} ({#raM#}) {#gA, gUQa-AMta-#}
<div n="lb"/>{#-rBOma-mArgaH#} {%-v. t.%} {#utKan#} 1 P, {#suzirI

Last line number is 84710.   There is thus some missing.
I pasted the rest of the current csl-orig/v02/ae/ae.txt after this in
Sampada's download.
So, Sampada's ae.txt now has the same number of lines (89606).
gasyoun commented 4 years ago

Last line number is 84710. There is thus some missing.

Nothing will ever run away from you.