Open Andhrabharati opened 2 years ago
@drdhaval2785
Even VCP is heavily affected by this change.
@vvasuki Do you have any solution which will not produce undesirable side effects of accents?
@vvasuki Do you have any solution which will not produce undesirable side effects of accents?
Yes - Don't use slp1_accented on dictionaries which don't have accents! Garbage in - garbage out.
Going back to a more basic design issue - why do you keep SLP1 encoding in the dicts in the first place? Maybe back in the day, unicode devanAgarI standard was not popular, so they had to do such monkey tricks. But in 2022, one can save devanAgarI data directly using devanAgarI unicode.
Can you specify which dictionaries have accents and which do not? I will make modifications accordingly.
It is not at all feasible to keep data in Devanagari unicode without unnecessary hassles. So SLP1 is going to stay for long time. I really look forward to a day when Devanagari Unicode would emulate Sanskrit consonants and vowels more naturally. It
Can you specify which dictionaries have accents and which do not? I will make modifications accordingly.
शब्दकल्पद्रुमः, वाचस्पत्यं च। अन्येऽपि स्युर् बहवः - ये जानन्ति, ते वदेयुः। चिता एव कोशाः स्वरं दर्शयन्ति।
It is not at all feasible to keep data in Devanagari unicode without unnecessary hassles. So SLP1 is going to stay for long time. I really look forward to a day when Devanagari Unicode would emulate Sanskrit consonants and vowels more naturally. It
Sentence is broken in the middle?
Anyway, use SLP1 or ISCII or ... for internal processing as needed however much you like. You don't need to store textual data in it - that's what leads to avoidable problems such as this.
EDIT: If you digitized SKD or VSP, you would use devanAgarI unicode! (as you know from your kosha project)
@drdhaval2785
Incidentally, even the tags and English text in those places (within the body matter) got converted to Devanagari, in those dictionaries.
This point also needs to be addressed.
I would appreciate examples
In VCP,
<फ्> for <P>
<ःई> for <HI>
<एदित् त्य्पे="ह्ट्"꣡> for <edit type="hw"/>
<छ्[०-९]+ for <C[0-9]+
<फिच्तुरे> for <Picture>
In SKD,
<ॠ> for <F>
<꣡ॠ> for </F>
<फिच्तुरे> for <Picture>
Also SKD (in contrast) has quite a few <H>
lines remained in slp1, unconverted to Devanagari.
And interestingly KRM has no such tag conversion issue.
As these are not related to the accent mark, guess they need spl. attention even with slp1 conversion!
Surprisingly, even the MW has fell a victim of this "tag conversion"!
<स्र्स्꣡> for <srs1>
<स्होर्त्लोन्ग्꣡> for <shortlong1>
These tag issues could be because @drdhaval2785 's scripts are not passing some toggler arguments (which are no longer set by default) - https://github.com/indic-transliteration/indic_transliteration_py/blob/1ba2688d235eccc0c5ac629c46ac9df83ef331f7/indic_transliteration/sanscript/__init__.py#L189 . Also, suitable togglers can be used to leave non-svara-encoding /
marks alone.
yes, I understand it.
I was informing him these tags, to be marked suitably similar to many other tags that are out of the purview of transliteration.
I am not aware when indic_transliteration package started to require explicit togglers. I never had similar problem earier. Maybe some version update introduced this artefact.
Will correct soon.
@drdhaval2785
just fyi (if you didn't notice it earlier)--
this indic_transliteration package can generate iast output as well, in addition to various other scripts (apart fron Devanagari).
@drdhaval2785
Just noticed that this new transliteration code has unwanted effects mainly in BOR & SKD, where no accent is involved, but just a slash in normal sense is intended.