BHS English word corrections

sanskrit-lexicon / csl-corrections

Replacement for sanskrit-lexicon/CORRECTIONS. User corrections to sanskrit-lexicon/csl-orig

GNU General Public License v3.0

0 stars 0 forks source link

BHS English word corrections #80

Closed funderburkjim closed 3 years ago

funderburkjim commented 3 years ago

We deal with the list of possible English word errors in bhs.txt (Buddhist Hybrid Sanskrit Dictionary). The list is provided in bhs_error.txt at commit 7b27d61e4d.

funderburkjim commented 3 years ago

bhs_error.txt is too large (4075 items) to examine each individually.

Work was done (see above commit) to separate the list into 2 parts:

bhs_error_ok.txt 3895 Assume these don't need to be examined now.
bhs_error_todo.txt 180 . This is the list to examine individually.

@sanskritisampada will examine the todo list.

funderburkjim commented 3 years ago

@sanskritisampada finished.

Corrections installed (see above commit to csl-orig)

70 print changes also noted (see above commit to this repository).

i's dotted and t's crossed.

gasyoun commented 3 years ago

bhs_error_ok.txt 3895 Assume these don't need to be examined now.

So all done?

funderburkjim commented 3 years ago

Yes, this task is considered done. -- Those 3895 'error_ok' cases are believed to be non-English.

gasyoun commented 3 years ago

3895 'error_ok' cases are believed to be non-English.

Should we mark them as such in the code?

funderburkjim commented 3 years ago

As I recall, many are Tibetan words, as the text indicates.
Someone versed in the specialty of this dictionary might benefit from adding markup. I don't think we should undertake such markup in the foreseeable future.

gasyoun commented 3 years ago

such markup in the foreseeable future.

At least we would not try to treat them as English words in the future, using AI approaches.