sanskrit-lexicon / csl-corrections

Replacement for sanskrit-lexicon/CORRECTIONS. User corrections to sanskrit-lexicon/csl-orig
GNU General Public License v3.0
0 stars 0 forks source link

IEG diestruck (non-Sanskrit headwords) #69

Open drdhaval2785 opened 2 years ago

drdhaval2785 commented 2 years ago

IEG diestruck is an English word Whether to give some additional markup to non-Sanskrit headword is a debatable issue.

Andhrabharati commented 2 years ago

Good point raised.

Otherwise it gives such an incomprehensible text as दिएस्त्रुच्क् in display.

image

This is not a Sanskrit word

image

There are quite many non-Sanskrit words in this work, which should be treated by their nativeness.

And the url interface could also have "Head word" and "Text (Body) word" input options [instead of the present "Sanskrit word" and "Text word" options], either of which could be any language/script.

gasyoun commented 2 years ago

Whether to give some additional markup to non-Sanskrit headword is a debatable issue

Makes sense. Otherwise we get monster output for rare words.

funderburkjim commented 2 years ago

A lot of code might be involved in handling this anomaly in a better way.

First, determine scope of problem by getting a list of non-sanskrit headwords in IEG.

Also, are there any other dictionaries with the anomaly?

Andhrabharati commented 2 years ago

@funderburkjim

I just did a quick workout with the IEG text.

There are 3 English words (in single quotes) and ~350 South Indian words out of 7096 <L> entries, that do not fit SLP1 encoding. ------------ The <L> count is not 7097, as <L>58 is just a part (<P>) of <L>57, but wrongly marked as another entry.

Andhrabharati commented 2 years ago

And there are 216 Grouped HW entries in the IEG text that could be split as sep. entries (with group info) as in MW etc., resulting in 236 addl. entries.

Andhrabharati commented 2 years ago

Coming to the 2nd query, probably BHS (supposed to be with some Pali and Prakrit words, which have short e & o vowels that are absent in Sanskrit and thus in SLP1) could be another candidate with this non-Skt words anomaly. [Need to check this!!]

Andhrabharati commented 2 years ago

Seen that there are some 'foreign' language, such as Greek and Persian, entries also that defy SLP1 encoding in this IEG text.

Andhrabharati commented 2 years ago

Also, are there any other dictionaries with the anomaly?

@funderburkjim PE seems to be one of such works- https://github.com/sanskrit-lexicon/GreekInSanskrit/issues/36#issuecomment-993539973

@drdhaval2785 also had identified this in a different context sometime back- https://github.com/sanskrit-lexicon/csl-corrections/issues/70#issue-957224860