indic-transliteration / indic_transliteration_py

Python package for indic script transliteration
MIT License
165 stars 33 forks source link

Reversibility loss between slp1_accented and devanagari #72

Closed drdhaval2785 closed 2 years ago

drdhaval2785 commented 2 years ago

indic-transliteration version 2.3.17 python version 3.9.1

>>> a = '1M/2'
>>> b = sanscript.transliterate(a, 'slp1_accented', 'devanagari')
>>> c = sanscript.transliterate(b, 'devanagari', 'slp1_accented')
>>> print(a)
1M/2
>>> print(c)
1/M2
>>> exit()

Note the difference between '1M/2' and '1/M2'

This gave rise to a difference where none existed before.

drdhaval2785 commented 2 years ago

This unusual data comes from https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=VCP&page=0991

(1/1) dfSyam 63 . atra rUpam 1 rASiM prakalpya
(1M/2) BAgAn SezAm SezAdapAsya aTavA BAgA-
(2M/9) pavAhaviDinA savarRite jAtam . (7/60)
(1M/4) anena 1 izwaguRite 63 dfzwe . Bakte jAtaM
(6M/10) dravyapramARam . 540 . idaM vilomasUtre-
RApi siDyati . aTa viSlezajAtyudAharaRam . “paYcAM-
drdhaval2785 commented 2 years ago

Related issue - https://github.com/sanskrit-lexicon/csl-orig/issues/746 If issue 746 is closed, this feature would not be requested.

vvasuki commented 2 years ago

It's a case of garbage in - garbage out ..

Data which is NOT slp1 should not be claimed as being slp1 and given to this module. The calling code should filter out such or use toggle options (see https://github.com/indic-transliteration/indic_transliteration_py/blob/47a618ec9914ba1eef6878fdc7444a4a7488dae3/indic_transliteration/sanscript/__init__.py#L189 )

drdhaval2785 commented 2 years ago

I agree.