Open drdhaval2785 opened 2 years ago
Good point raised.
Otherwise it gives such an incomprehensible text as दिएस्त्रुच्क् in display.
This is not a Sanskrit word
There are quite many non-Sanskrit words in this work, which should be treated by their nativeness.
And the url interface could also have "Head word" and "Text (Body) word" input options [instead of the present "Sanskrit word" and "Text word" options], either of which could be any language/script.
Whether to give some additional markup to non-Sanskrit headword is a debatable issue
Makes sense. Otherwise we get monster output for rare words.
A lot of code might be involved in handling this anomaly in a better way.
First, determine scope of problem by getting a list of non-sanskrit headwords in IEG.
Also, are there any other dictionaries with the anomaly?
@funderburkjim
I just did a quick workout with the IEG text.
There are 3 English words (in single quotes) and ~350 South Indian words out of 7096 <L>
entries, that do not fit SLP1 encoding.
------------
The <L>
count is not 7097, as <L>58
is just a part (<P>
) of <L>57
, but wrongly marked as another entry.
And there are 216 Grouped HW entries in the IEG text that could be split as sep. entries (with group info) as in MW etc., resulting in 236 addl. entries.
Coming to the 2nd query, probably BHS (supposed to be with some Pali and Prakrit words, which have short e & o vowels that are absent in Sanskrit and thus in SLP1) could be another candidate with this non-Skt words anomaly. [Need to check this!!]
Seen that there are some 'foreign' language, such as Greek and Persian, entries also that defy SLP1 encoding in this IEG text.
Also, are there any other dictionaries with the anomaly?
@funderburkjim PE seems to be one of such works- https://github.com/sanskrit-lexicon/GreekInSanskrit/issues/36#issuecomment-993539973
@drdhaval2785 also had identified this in a different context sometime back- https://github.com/sanskrit-lexicon/csl-corrections/issues/70#issue-957224860
IEG diestruck is an English word Whether to give some additional markup to non-Sanskrit headword is a debatable issue.