sanskrit-lexicon / csl-devanagari

Convert SLP1 data from csl-orig into Devanagari for easy proofreading.
0 stars 1 forks source link

slp1 to slp1_accented in transliteration-- Side effects!! #38

Open Andhrabharati opened 2 years ago

Andhrabharati commented 2 years ago

@drdhaval2785

Just noticed that this new transliteration code has unwanted effects mainly in BOR & SKD, where no accent is involved, but just a slash in normal sense is intended.

Andhrabharati commented 2 years ago

@drdhaval2785

Even VCP is heavily affected by this change.

drdhaval2785 commented 2 years ago

@vvasuki Do you have any solution which will not produce undesirable side effects of accents?

vvasuki commented 2 years ago

@vvasuki Do you have any solution which will not produce undesirable side effects of accents?

Yes - Don't use slp1_accented on dictionaries which don't have accents! Garbage in - garbage out.

Going back to a more basic design issue - why do you keep SLP1 encoding in the dicts in the first place? Maybe back in the day, unicode devanAgarI standard was not popular, so they had to do such monkey tricks. But in 2022, one can save devanAgarI data directly using devanAgarI unicode.

drdhaval2785 commented 2 years ago

Can you specify which dictionaries have accents and which do not? I will make modifications accordingly.

It is not at all feasible to keep data in Devanagari unicode without unnecessary hassles. So SLP1 is going to stay for long time. I really look forward to a day when Devanagari Unicode would emulate Sanskrit consonants and vowels more naturally. It

vvasuki commented 2 years ago

Can you specify which dictionaries have accents and which do not? I will make modifications accordingly.

शब्दकल्पद्रुमः, वाचस्पत्यं च। अन्येऽपि स्युर् बहवः - ये जानन्ति, ते वदेयुः। चिता एव कोशाः स्वरं दर्शयन्ति।

It is not at all feasible to keep data in Devanagari unicode without unnecessary hassles. So SLP1 is going to stay for long time. I really look forward to a day when Devanagari Unicode would emulate Sanskrit consonants and vowels more naturally. It

Sentence is broken in the middle?

Anyway, use SLP1 or ISCII or ... for internal processing as needed however much you like. You don't need to store textual data in it - that's what leads to avoidable problems such as this.

EDIT: If you digitized SKD or VSP, you would use devanAgarI unicode! (as you know from your kosha project)

Andhrabharati commented 2 years ago

@drdhaval2785

Incidentally, even the tags and English text in those places (within the body matter) got converted to Devanagari, in those dictionaries.

This point also needs to be addressed.

drdhaval2785 commented 2 years ago

I would appreciate examples

Andhrabharati commented 2 years ago

In VCP, <फ्> for <P> <ःई> for <HI> <एदित् त्य्पे="ह्ट्"꣡> for <edit type="hw"/> <छ्[०-९]+ for <C[0-9]+ <फिच्तुरे> for <Picture>

In SKD, <ॠ> for <F> <꣡ॠ> for </F> <फिच्तुरे> for <Picture>

Also SKD (in contrast) has quite a few <H> lines remained in slp1, unconverted to Devanagari.

And interestingly KRM has no such tag conversion issue.

As these are not related to the accent mark, guess they need spl. attention even with slp1 conversion!

Andhrabharati commented 2 years ago

Surprisingly, even the MW has fell a victim of this "tag conversion"! <स्र्स्꣡> for <srs1> <स्होर्त्लोन्ग्꣡> for <shortlong1>

vvasuki commented 2 years ago

These tag issues could be because @drdhaval2785 's scripts are not passing some toggler arguments (which are no longer set by default) - https://github.com/indic-transliteration/indic_transliteration_py/blob/1ba2688d235eccc0c5ac629c46ac9df83ef331f7/indic_transliteration/sanscript/__init__.py#L189 . Also, suitable togglers can be used to leave non-svara-encoding / marks alone.

Andhrabharati commented 2 years ago

yes, I understand it.

I was informing him these tags, to be marked suitably similar to many other tags that are out of the purview of transliteration.

drdhaval2785 commented 2 years ago

I am not aware when indic_transliteration package started to require explicit togglers. I never had similar problem earier. Maybe some version update introduced this artefact.

Will correct soon.

Andhrabharati commented 2 years ago

@drdhaval2785

just fyi (if you didn't notice it earlier)--

this indic_transliteration package can generate iast output as well, in addition to various other scripts (apart fron Devanagari).