lcnetdev / scriptshifter

Creative Commons Zero v1.0 Universal
15 stars 6 forks source link

Use "HindiMarathiRomanLoCFix" parameter for additional Devanagari scripts #132

Open tventimi opened 1 month ago

tventimi commented 1 month ago

I was recently discussing ScriptShifter and Parallelogram with Ellen Ambrosone (Princeton's South Asian Studies Librarian) and shared a few examples in order to get her feedback. She did point out one transliteration issue with Devanagari that I wanted to pass along. (As I am not familiar with these scripts myself, I am just going to copy her comments verbatim):

The Library of Congress Romanization systems for Sanskrit and Hindi (and Marathi, Nepali, etc) are different with respect to the retroflex sibilant. See the retroflex sa for Hindi (ष sha) vs. the very same letter for Sanskrit (ष ṣa). Because of this, ScriptShifter is running into problems both with Romanization and converting Roman to script – part of the issue is the “Devanagari” option in the Language dropdown – it would be intuitive for someone to want to select it to convert all the Devanagari-based languages, but it currently privileges the Sanskrit Romanization, so it’s not a good idea to do so. In testing ScriptShifter, please note the following:

  • When the correct LC-Romanization for Hindi (e.g. rāshṭrīya) is converted to Hindi (Devanagari), the output is incorrect (र्āष्ṭर्īय).
  • When the correct LC-Romanization for Hindi (e.g. rāshṭrīya) is converted to Devanagari, the output is incorrect (रास्ह्ट्रीय).
  • When the Sanskrit LC-Romanization is applied (e.g. rāṣṭrīya) and is converted to Hindi (Devanagari), the output is incorrect (र्āṣṭर्īय). – this makes sense.
  • When the Sanskrit LC-Romanization is applied (e.g. rāṣṭrīya) and is converted to Devanagari, the output is correct (राष्ट्रीय).
  • When the correct Hindi (Devanagari) script (राष्ट्रीय) is Romanized, the output is correct (rāshṭrīya).
  • When just Devanagari is selected in the Language drop-down and राष्ट्रीय is Romanized, the output is incorrect (rāṣṭrīya) for Hindi Romanization, but correct for Sanskrit Romanization.

I know that ScriptShifter uses the service https://www.aksharamukha.com/ for many of the Devanagari scripts, including Marathi and Nepali. As I am sure you are aware, there is a parameter for Aksharamukha named "HindiMarathiRomanLoCFix", which addresses the distinction Ellen is talking about. This parameter is set in the marathi_devanagari.yml file itself, but apparently there are other Devanagari languages that it should be used for. One of these is Nepali - Ellen provided me with the following examples:

See for example “rāshṭriya” in Nepali this DLC record from 2020: https://catalog.princeton.edu/catalog/SCSB-14492900 (MARC: https://hollis.harvard.edu/primo-explore/sourceRecord?vid=HVD2&docId=01HVD_ALMA212601402370003941)

And see the same for “rāshṭriya” in Hindi in this DLC record from 2022: https://catalog.princeton.edu/catalog/SCSB-14564380 (MARC: https://clio.columbia.edu/catalog/17703977/librarian_view)

She recommended that LOC's South Asian specialists review which languages would require this parameter to be set. So, I just wanted to pass that along.

scossu commented 1 month ago

Thanks for the detailed report. You are correct, Devanagari has some specific options that I am not familiar with as a non-language expert. The only language I felt confident to enable this for was Marathi, which is explicitly mentioned in the Aksharamukha documentation in this regard.

I will forward this info to the catalogers and follow their suggestion for a fix.