Closed funderburkjim closed 3 years ago
bhs_error.txt is too large (4075 items) to examine each individually.
Work was done (see above commit) to separate the list into 2 parts:
@sanskritisampada will examine the todo list.
@sanskritisampada finished.
Corrections installed (see above commit to csl-orig)
70 print changes also noted (see above commit to this repository).
i's dotted and t's crossed.
bhs_error_ok.txt 3895 Assume these don't need to be examined now.
So all done?
Yes, this task is considered done. -- Those 3895 'error_ok' cases are believed to be non-English.
3895 'error_ok' cases are believed to be non-English.
Should we mark them as such in the code?
As I recall, many are Tibetan words, as the text indicates.
Someone versed in the specialty of this dictionary might benefit from adding markup.
I don't think we should undertake such markup in the foreseeable future.
such markup in the foreseeable future.
At least we would not try to treat them as English words in the future, using AI approaches.
We deal with the list of possible English word errors in bhs.txt (Buddhist Hybrid Sanskrit Dictionary). The list is provided in bhs_error.txt at commit 7b27d61e4d.