Closed ppasedach closed 6 years ago
Thank you for reporting this. Unfortunately I don't really know about the transliteration Biber stuff and @plk is a bit snowed under at the moment, so I can't promise a quick fix.
As far as I know Biber uses an external library to do the transliteration (Lingua::Translit
), so this could be either a bug in the external library or a bug in how Biber handles the returned results for sorting.
I don't suppose you could check that the Lingua::Translit
library returns the expected values?
```latex \documentclass{article} \listfiles \usepackage{polyglossia} \setdefaultlanguage{sanskrit} \newfontfamily\sanskritfont{Latin Modern Roman} \usepackage{fontspec} \usepackage{biblatex} \usepackage{filecontents} \addbibresource{\jobname.bib} \begin{filecontents*}{\jobname.bib} @misc{aka, title = {aka}, } @misc{aṃka, title = {aṃka}, } @misc{aca, title = {aca}, } @misc{aṃca, title = {aṃca}, } @misc{aṭa, title = {aṭa}, } @misc{aṃṭa, title = {aṃṭa}, } @misc{ata, title = {ata}, } @misc{aṃta, title = {aṃta}, } @misc{apa, title = {apa}, } @misc{aṃpa, title = {aṃpa}, } @misc{aya, title = {aya}, } @misc{aṃya, title = {aṃya}, } @misc{ara, title = {ara}, } @misc{aṃra, title = {aṃra}, } @misc{ala, title = {ala}, } @misc{aṃla, title = {aṃla}, } @misc{ava, title = {ava}, } @misc{aṃva, title = {aṃva}, } @misc{aśa, title = {aśa}, } @misc{aṃśa, title = {aṃśa}, } @misc{aṣa, title = {aṣa}, } @misc{aṃṣa, title = {aṃṣa}, } @misc{asa, title = {asa}, } @misc{aṃsa, title = {aṃsa}, } @misc{aha, title = {aha}, } @misc{aṃha, title = {aṃha}, } @misc{aḥka, title = {aḥka}, } @misc{aḥca, title = {aḥca}, } @misc{aḥṭa, title = {aḥṭa}, } @misc{aḥta, title = {aḥta}, } @misc{aḥpa, title = {aḥpa}, } @misc{aḥya, title = {aḥya}, } @misc{aḥra, title = {aḥra}, } @misc{aḥla, title = {aḥla}, } @misc{aḥva, title = {aḥva}, } @misc{aḥśa, title = {aḥśa}, } @misc{aḥṣa, title = {aḥṣa}, } @misc{aḥsa, title = {aḥsa}, } @misc{aḥha, title = {aḥha}, } @misc{Agnipurāṇa, title = {Agnipurāṇa}, } @misc{Agniveśyagṛhyasūtra, title = {Agniveśyagṛhyasūtra}, } @misc{Atharvavedapariśiṣṭa, title = {Atharvavedapariśiṣṭa}, } @misc{Abhayapaddhati, title = {Abhayapaddhati}, } @misc{Amoghapāśakalparāja, title = {Amoghapāśakalparāja}, } @misc{Arthaśāstra, title = {Arthaśāstra}, } @misc{Alaṃkārakārikā, title = {Alaṃkārakārikā}, } @misc{Īśānaśivagurudevapaddhati, title = {Īśānaśivagurudevapaddhati}, } @misc{Ṛgvidhāna, title = {Ṛgvidhāna}, } @misc{Kalyāṇakāmadhenu, title = {Kalyāṇakāmadhenu}, } @misc{Kiraṇatantra, title = {Kiraṇatantra}, } @misc{Kubjikāmatatantra, title = {Kubjikāmatatantra}, } @misc{Kuṭṭanīmata, title = {Kuṭṭanīmata}, } @misc{Kṛṣṇayamāritantrapañjikā, title = {Kṛṣṇayamāritantrapañjikā}, } @misc{Guhyasamājatantra, title = {Guhyasamājatantra}, } @misc{Guhyasamājamaṇḍalavidhi, title = {Guhyasamājamaṇḍalavidhi}, } @misc{Guhyasiddhi, title = {Guhyasiddhi}, } @misc{Caṇḍamahāroṣaṇatantra, title = {Caṇḍamahāroṣaṇatantra}, } @misc{Caṇḍamahāroṣaṇatantrapañjikā, title = {Caṇḍamahāroṣaṇatantrapañjikā Padmāvatī}, } @misc{Chandaḥsaṃgraha, title = {Chandaḥsaṃgraha}, } @misc{Chandaḥsāra, title = {Chandaḥsāra}, } @misc{Jayākhyasaṃhitā, title = {Jayākhyasaṃhitā}, } @misc{Jñānaratnāvalī, title = {Jñānaratnāvalī}, } @misc{Jyotiḥsāra, title = {Jyotiḥsāra}, } @misc{Tattvaratnāvalī, title = {Tattvaratnāvalī}, } @misc{Tantrasadbhāva, title = {Tantrasadbhāva}, } @misc{Tantrāloka, title = {Tantrāloka}, } @misc{Divyāvadāna, title = {Divyāvadāna}, } @misc{Derge, title = {Derge}, } @misc{Nityādisaṅgrahābhidhānapaddhati, title = {Nityādisaṅgrahābhidhānapaddhati}, } @misc{Niśvāsatattvasaṃhitā, title = {Niśvāsatattvasaṃhitā}, } @misc{Niśvāsakārikā, title = {Niśvāsakārikā}, } @misc{Parākhyatantra, title = {Parākhyatantra}, } @misc{Pārameśvaratantra, title = {Pārameśvaratantra}, } @misc{Pūrva-Kāmika, title = {Pūrva-Kāmika}, } @misc{Pratiṣṭhālakṣaṇasārasamuccaya, title = {Pratiṣṭhālakṣaṇasārasamuccaya}, } @misc{Brahmayāmalatantra, title = {Brahmayāmalatantra}, } @misc{Bhairavapadmāvatīkalpa , title = {Bhairavapadmāvatīkalpa }, } @misc{Mañjuśriyamūlakalpa, title = {Mañjuśriyamūlakalpa}, } @misc{Mataṅgapārameśvarāgama, title = {Mataṅgapārameśvarāgama}, } @misc{Mālinīvijayottaratantra, title = {Mālinīvijayottaratantra}, } @misc{Muktāvalī, title = {Muktāvalī}, } @misc{Mṛgendratantra, title = {Mṛgendratantra}, } @misc{Bṛhatsaṃhitā, title = {Bṛhatsaṃhitā}, } @misc{Rauravasūtrasaṅgraha, title = {Rauravasūtrasaṅgraha}, } @misc{Laghutantraṭīkā, title = {Laghutantraṭīkā}, } @misc{Laghuśaṃvaratantra, title = {Laghuśaṃvaratantra}, } @misc{Vajrāvalī, title = {Vajrāvalī}, } @misc{Vimalaprabhā, title = {Vimalaprabhā}, } @misc{Vīṇāśikhatantra, title = {Vīṇāśikhatantra}, } @misc{Śāradātilaka, title = {Śāradātilaka}, } @misc{Śivatattvaratnākara, title = {Śivatattvaratnākara}, } @misc{Sampuṭatantraprakaraṇārthanirṇaya, title = {Sampuṭatantraprakaraṇārthanirṇaya}, } @misc{Sampuṭodbhavatantra, title = {Sampuṭodbhavatantra}, } @misc{Sarvajñānottaratantra, title = {Sarvajñānottaratantra}, } @misc{Sarvajñānottaravṛtti, title = {Sarvajñānottaravṛtti}, } @misc{Sarvatathāgatatattvasaṅgraha, title = {Sarvatathāgatatattvasaṅgraha}, } @misc{Sarvatathāgatādhiṣṭhānasattvāvalokanabuddhakṣetrasaṃdarśanavyūha, title = {Sarvatathāgatādhiṣṭhānasattvāvalokanabuddhakṣetrasaṃdarśanavyūha}, } @misc{Sādhanamālā, title = {Sādhanamālā}, } @misc{Sārdhatriśatikālottara, title = {Sārdhatriśatikālottara}, } @misc{Siddhayogeśvarīmata, title = {Siddhayogeśvarīmata}, } @misc{Siddhaikavīratantra, title = {Siddhaikavīratantra}, } @misc{Saurasaṃhitā, title = {Saurasaṃhitā}, } @misc{Svacchandatantra, title = {Svacchandatantra}, } @misc{Svāyambhuvapāñcarātra, title = {Svāyambhuvapāñcarātra}, } @misc{Svāyambhuvasūtrasaṅgraha, title = {Svāyambhuvasūtrasaṅgraha}, } @misc{Harṣacarita, title = {Harṣacarita}, } @misc{Hevajratantra, title = {Hevajratantra}, } \end{filecontents*} \DeclareSortTranslit{ \translit[title]{iast}{devanagari} } \begin{document} \nocite{*} \printbibliography \end{document} ```
Yes, in the link referenced above I have helped @plk create the IAST to Devanāgarī module for Lingua::Translit, and it has been already very useful for me also outside of biblatex for perl scripts converting e-texts. I will check if they still work as expected, or if some bug has crept in there since I last used them.
Is there a debugging possibility to have biber write the full transliterated strings to a file for inspection? I have seen only the sortinit fields in the bbl file containing some Devanāgarī, which I'm afraid is not enough to understand what happened.
Well, I have now inserted another item into the testing bibliography, Ānanda, and as I feared it got mixed up with the "A"-entries. The sortinit field is {अ̄} which when copying it into gedit looks like a short a with bar over it, I suspect that the diacritical combination of a and ¯ was treated separately, first the a gets transliterated to the proper devanāgarī short a, and then the diacritical mark is added to that, which makes no sense in Devanāgarī. The sortinithash field is the same as for the regular short a.
I will now dig out my transliteration perl script and test it with a current version of Lingua::Translit, and let you know about the results.
I have tested my scripts using Lingua::Translit, their output so far seems correct. I have also updated the module, which appears to not have changed anything. The Sanskrit collation with biblatex is still broken.
Thank you for checking that. If it's not a Lingua::Translit problem, we will have to wait for PLK. You can run Biber with the --trace
option and obtain a huge .blg
file that may or may not contain a bit more info on what happens to sorting (I'm not sure).
Hmm, I will check on this. This must be something to do with macro decoding changes. If you run biber
with the --trace
flag and search the .blg
file for "Keys before sort", you will see the transliterated titles and see if they look right.
Yes, with --trace
one can see what happens. The diacritical combinations are messed up. The base characters are transliterated into Devanāgarī, and the marks are then attached to their Devanāgarī equivalents. Thus for example Agnipurāṇa is transformed into अग्निपुर̄न्̣अ . My browser, or the font used by it, refuses to display these nonsensical combinations, here an image of how it looks like:
Even if you don't read Devanāgarī you can recognise the bar above the "ra", and the dot above the "n", to which, for some reason the virāma is added, and then at the end the independent vowel a. It should actually be
.
This test file give me the wrong output according to the above - can you verify:
#!/opt/local/bin/perl -CS
use v5.24;
use Lingua::Translit;
use utf8;
my $t = new Lingua::Translit('IAST Devanagari');
say $t->translit('Agnipurāṇa');
अग्निपुर̄न्̣अ
I get the same output, but that could be an input issue. According to https://w3c.github.io/xml-entities/unicode-names.html the code snippet uses the combining accents U+0061 LATIN SMALL LETTER A
with U+0304 COMBINING MACRON
and U+006e LATIN SMALL LETTER N
with U+0323 COMBINING DOT BELOW
. If one uses the predefined glyphs U+0101 LATIN SMALL LETTER A WITH MACRON
and U+1e47 LATIN SMALL LETTER N WITH DOT BELOW
instead one gets the expected output (if I understand correctly)
#!/opt/local/bin/perl -CS
use v5.24;
use Lingua::Translit;
use utf8;
my $t = new Lingua::Translit('IAST Devanagari');
say $t->translit('Agnipurāṇa');
say $t->translit('Agnipurāṇa');
Ah, ok, then it's a Unicode normalisation issue, looking into it.
Please try biber 2.12 dev version from SF. For some reason calls to Lingua::Translit
were not respected as a NFC boundary. I suspect this was due to another change to macro encoding structure a while ago.
I can confirm that I now get a different order than before, but whether or not that is right is a question for @ppasedach duvud.pdf
This pdf looks already much better on a quick look, but I am still a bit surprised by 1-26 being sorted in before everything else, and 107 at the very end. This might be according to another (Hindi-?)sorting convention, the treatment of ṃ, ḥ, and the ligature jñ considered as a letter in its own right. I still have to look at it more carefully.
On Thu, Jun 28, 2018 at 4:00 PM, moewew notifications@github.com wrote:
I can confirm that I now get a different order than before, but whether or not that is right is a question for @ppasedach https://github.com/ppasedach duvud.pdf https://github.com/plk/biblatex/files/2145594/duvud.pdf
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/plk/biblatex/issues/765#issuecomment-401044895, or mute the thread https://github.com/notifications/unsubscribe-auth/AK-_oIi3I-YnOZnmgeJDARgCq_ipq_rmks5uBOFwgaJpZM4U3kUu .
According to --trace
the transliterations used are
[1260] Biber.pm:3976> DEBUG - aṃka => mm,,अंक,अंक,,0
[1260] Biber.pm:3976> DEBUG - aṃca => mm,,अंच,अंच,,0
[1260] Biber.pm:3976> DEBUG - aṃṭa => mm,,अंट,अंट,,0
[1260] Biber.pm:3976> DEBUG - aṃta => mm,,अंत,अंत,,0
[1260] Biber.pm:3976> DEBUG - aṃpa => mm,,अंप,अंप,,0
[1260] Biber.pm:3976> DEBUG - aṃya => mm,,अंय,अंय,,0
[1261] Biber.pm:3976> DEBUG - aṃra => mm,,अंर,अंर,,0
[1261] Biber.pm:3976> DEBUG - aṃla => mm,,अंल,अंल,,0
[1261] Biber.pm:3976> DEBUG - aṃva => mm,,अंव,अंव,,0
[1261] Biber.pm:3976> DEBUG - aṃśa => mm,,अंश,अंश,,0
[1261] Biber.pm:3976> DEBUG - aṃṣa => mm,,अंष,अंष,,0
[1261] Biber.pm:3976> DEBUG - aṃsa => mm,,अंस,अंस,,0
[1262] Biber.pm:3976> DEBUG - aṃha => mm,,अंह,अंह,,0
[1262] Biber.pm:3976> DEBUG - aḥka => mm,,अःक,अःक,,0
[1262] Biber.pm:3976> DEBUG - aḥca => mm,,अःच,अःच,,0
[1263] Biber.pm:3976> DEBUG - aḥṭa => mm,,अःट,अःट,,0
[1263] Biber.pm:3976> DEBUG - aḥta => mm,,अःत,अःत,,0
[1263] Biber.pm:3976> DEBUG - aḥpa => mm,,अःप,अःप,,0
[1263] Biber.pm:3976> DEBUG - aḥya => mm,,अःय,अःय,,0
[1263] Biber.pm:3976> DEBUG - aḥra => mm,,अःर,अःर,,0
[1263] Biber.pm:3976> DEBUG - aḥla => mm,,अःल,अःल,,0
[1263] Biber.pm:3976> DEBUG - aḥva => mm,,अःव,अःव,,0
[1263] Biber.pm:3976> DEBUG - aḥśa => mm,,अःश,अःश,,0
[1264] Biber.pm:3976> DEBUG - aḥṣa => mm,,अःष,अःष,,0
[1264] Biber.pm:3976> DEBUG - aḥsa => mm,,अःस,अःस,,0
[1264] Biber.pm:3976> DEBUG - aḥha => mm,,अःह,अःह,,0
[1265] Biber.pm:3976> DEBUG - aka => mm,,अक,अक,,0
[1265] Biber.pm:3976> DEBUG - Agnipurāṇa => mm,,अग्निपुराण,अग्निपुराण,,0
[1265] Biber.pm:3976> DEBUG - Agniveśyagṛhyasūtra => mm,,अग्निवेश्यगृह्यसूत्र,अग्निवेश्यगृह्यसूत्र,,0
[1265] Biber.pm:3976> DEBUG - aca => mm,,अच,अच,,0
[1265] Biber.pm:3976> DEBUG - aṭa => mm,,अट,अट,,0
[1266] Biber.pm:3976> DEBUG - ata => mm,,अत,अत,,0
[1266] Biber.pm:3976> DEBUG - Atharvavedapariśiṣṭa => mm,,अथर्ववेदपरिशिष्ट,अथर्ववेदपरिशिष्ट,,0
[1266] Biber.pm:3976> DEBUG - apa => mm,,अप,अप,,0
[1266] Biber.pm:3976> DEBUG - Abhayapaddhati => mm,,अभयपद्धति,अभयपद्धति,,0
[1266] Biber.pm:3976> DEBUG - Amoghapāśakalparāja => mm,,अमोघपाशकल्पराज,अमोघपाशकल्पराज,,0
[1266] Biber.pm:3976> DEBUG - aya => mm,,अय,अय,,0
[1266] Biber.pm:3976> DEBUG - ara => mm,,अर,अर,,0
[1266] Biber.pm:3976> DEBUG - Arthaśāstra => mm,,अर्थशास्त्र,अर्थशास्त्र,,0
[1266] Biber.pm:3976> DEBUG - ala => mm,,अल,अल,,0
[1266] Biber.pm:3976> DEBUG - Alaṃkārakārikā => mm,,अलंकारकारिका,अलंकारकारिका,,0
[1266] Biber.pm:3976> DEBUG - ava => mm,,अव,अव,,0
[1266] Biber.pm:3976> DEBUG - aśa => mm,,अश,अश,,0
[1266] Biber.pm:3976> DEBUG - aṣa => mm,,अष,अष,,0
[1267] Biber.pm:3976> DEBUG - asa => mm,,अस,अस,,0
[1267] Biber.pm:3976> DEBUG - aha => mm,,अह,अह,,0
[1267] Biber.pm:3976> DEBUG - Īśānaśivagurudevapaddhati => mm,,ईशानशिवगुरुदेवपद्धति,ईशानशिवगुरुदेवपद्धति,,0
[1267] Biber.pm:3976> DEBUG - Ṛgvidhāna => mm,,ऋग्विधान,ऋग्विधान,,0
[1267] Biber.pm:3976> DEBUG - Kalyāṇakāmadhenu => mm,,कल्याणकामधेनु,कल्याणकामधेनु,,0
[1267] Biber.pm:3976> DEBUG - Kiraṇatantra => mm,,किरणतन्त्र,किरणतन्त्र,,0
[1267] Biber.pm:3976> DEBUG - Kuṭṭanīmata => mm,,कुट्टनीमत,कुट्टनीमत,,0
[1267] Biber.pm:3976> DEBUG - Kubjikāmatatantra => mm,,कुब्जिकामततन्त्र,कुब्जिकामततन्त्र,,0
[1267] Biber.pm:3976> DEBUG - Kṛṣṇayamāritantrapañjikā => mm,,कृष्णयमारितन्त्रपञ्जिका,कृष्णयमारितन्त्रपञ्जिका,,0
[1267] Biber.pm:3976> DEBUG - Guhyasamājatantra => mm,,गुह्यसमाजतन्त्र,गुह्यसमाजतन्त्र,,0
[1267] Biber.pm:3976> DEBUG - Guhyasamājamaṇḍalavidhi => mm,,गुह्यसमाजमण्डलविधि,गुह्यसमाजमण्डलविधि,,0
[1267] Biber.pm:3976> DEBUG - Guhyasiddhi => mm,,गुह्यसिद्धि,गुह्यसिद्धि,,0
[1267] Biber.pm:3976> DEBUG - Caṇḍamahāroṣaṇatantra => mm,,चण्डमहारोषणतन्त्र,चण्डमहारोषणतन्त्र,,0
[1267] Biber.pm:3976> DEBUG - Caṇḍamahāroṣaṇatantrapañjikā => mm,,चण्डमहारोषणतन्त्रपञ्जिका पद्मावती,चण्डमहारोषणतन्त्रपञ्जिका पद्मावती,,0
[1268] Biber.pm:3976> DEBUG - Chandaḥsaṃgraha => mm,,छन्दःसंग्रह,छन्दःसंग्रह,,0
[1268] Biber.pm:3976> DEBUG - Chandaḥsāra => mm,,छन्दःसार,छन्दःसार,,0
[1268] Biber.pm:3976> DEBUG - Jayākhyasaṃhitā => mm,,जयाख्यसंहिता,जयाख्यसंहिता,,0
[1268] Biber.pm:3976> DEBUG - Jyotiḥsāra => mm,,ज्योतिःसार,ज्योतिःसार,,0
[1268] Biber.pm:3976> DEBUG - Tattvaratnāvalī => mm,,तत्त्वरत्नावली,तत्त्वरत्नावली,,0
[1268] Biber.pm:3976> DEBUG - Tantrasadbhāva => mm,,तन्त्रसद्भाव,तन्त्रसद्भाव,,0
[1268] Biber.pm:3976> DEBUG - Tantrāloka => mm,,तन्त्रालोक,तन्त्रालोक,,0
[1268] Biber.pm:3976> DEBUG - Divyāvadāna => mm,,दिव्यावदान,दिव्यावदान,,0
[1268] Biber.pm:3976> DEBUG - Derge => mm,,देर्गे,देर्गे,,0
[1268] Biber.pm:3976> DEBUG - Nityādisaṅgrahābhidhānapaddhati => mm,,नित्यादिसङ्ग्रहाभिधानपद्धति,नित्यादिसङ्ग्रहाभिधानपद्धति,,0
[1268] Biber.pm:3976> DEBUG - Niśvāsakārikā => mm,,निश्वासकारिका,निश्वासकारिका,,0
[1268] Biber.pm:3976> DEBUG - Niśvāsatattvasaṃhitā => mm,,निश्वासतत्त्वसंहिता,निश्वासतत्त्वसंहिता,,0
[1268] Biber.pm:3976> DEBUG - Parākhyatantra => mm,,पराख्यतन्त्र,पराख्यतन्त्र,,0
[1269] Biber.pm:3976> DEBUG - Pārameśvaratantra => mm,,पारमेश्वरतन्त्र,पारमेश्वरतन्त्र,,0
[1269] Biber.pm:3976> DEBUG - Pūrva-Kāmika => mm,,पूर्व-कामिक,पूर्व-कामिक,,0
[1269] Biber.pm:3976> DEBUG - Pratiṣṭhālakṣaṇasārasamuccaya => mm,,प्रतिष्ठालक्षणसारसमुच्चय,प्रतिष्ठालक्षणसारसमुच्चय,,0
[1269] Biber.pm:3976> DEBUG - Bṛhatsaṃhitā => mm,,बृहत्संहिता,बृहत्संहिता,,0
[1269] Biber.pm:3976> DEBUG - Brahmayāmalatantra => mm,,ब्रह्मयामलतन्त्र,ब्रह्मयामलतन्त्र,,0
[1269] Biber.pm:3976> DEBUG - Bhairavapadmāvatīkalpa => mm,,भैरवपद्मावतीकल्प,भैरवपद्मावतीकल्प,,0
[1269] Biber.pm:3976> DEBUG - Mañjuśriyamūlakalpa => mm,,मञ्जुश्रियमूलकल्प,मञ्जुश्रियमूलकल्प,,0
[1269] Biber.pm:3976> DEBUG - Mataṅgapārameśvarāgama => mm,,मतङ्गपारमेश्वरागम,मतङ्गपारमेश्वरागम,,0
[1269] Biber.pm:3976> DEBUG - Mālinīvijayottaratantra => mm,,मालिनीविजयोत्तरतन्त्र,मालिनीविजयोत्तरतन्त्र,,0
[1269] Biber.pm:3976> DEBUG - Muktāvalī => mm,,मुक्तावली,मुक्तावली,,0
[1269] Biber.pm:3976> DEBUG - Mṛgendratantra => mm,,मृगेन्द्रतन्त्र,मृगेन्द्रतन्त्र,,0
[1269] Biber.pm:3976> DEBUG - Rauravasūtrasaṅgraha => mm,,रौरवसूत्रसङ्ग्रह,रौरवसूत्रसङ्ग्रह,,0
[1270] Biber.pm:3976> DEBUG - Laghutantraṭīkā => mm,,लघुतन्त्रटीका,लघुतन्त्रटीका,,0
[1270] Biber.pm:3976> DEBUG - Laghuśaṃvaratantra => mm,,लघुशंवरतन्त्र,लघुशंवरतन्त्र,,0
[1270] Biber.pm:3976> DEBUG - Vajrāvalī => mm,,वज्रावली,वज्रावली,,0
[1270] Biber.pm:3976> DEBUG - Vimalaprabhā => mm,,विमलप्रभा,विमलप्रभा,,0
[1270] Biber.pm:3976> DEBUG - Vīṇāśikhatantra => mm,,वीणाशिखतन्त्र,वीणाशिखतन्त्र,,0
[1270] Biber.pm:3976> DEBUG - Śāradātilaka => mm,,शारदातिलक,शारदातिलक,,0
[1270] Biber.pm:3976> DEBUG - Śivatattvaratnākara => mm,,शिवतत्त्वरत्नाकर,शिवतत्त्वरत्नाकर,,0
[1270] Biber.pm:3976> DEBUG - Sampuṭatantraprakaraṇārthanirṇaya => mm,,सम्पुटतन्त्रप्रकरणार्थनिर्णय,सम्पुटतन्त्रप्रकरणार्थनिर्णय,,0
[1270] Biber.pm:3976> DEBUG - Sampuṭodbhavatantra => mm,,सम्पुटोद्भवतन्त्र,सम्पुटोद्भवतन्त्र,,0
[1270] Biber.pm:3976> DEBUG - Sarvatathāgatatattvasaṅgraha => mm,,सर्वतथागततत्त्वसङ्ग्रह,सर्वतथागततत्त्वसङ्ग्रह,,0
[1270] Biber.pm:3976> DEBUG - Sarvatathāgatādhiṣṭhānasattvāvalokanabuddhakṣetrasaṃdarśanavyūha => mm,,सर्वतथागताधिष्ठानसत्त्वावलोकनबुद्धक्षेत्रसंदर्शनव्यूह,सर्वतथागताधिष्ठानसत्त्वावलोकनबुद्धक्षेत्रसंदर्शनव्यूह,,0
[1270] Biber.pm:3976> DEBUG - Sarvajñānottaratantra => mm,,सर्वज्ञानोत्तरतन्त्र,सर्वज्ञानोत्तरतन्त्र,,0
[1271] Biber.pm:3976> DEBUG - Sarvajñānottaravṛtti => mm,,सर्वज्ञानोत्तरवृत्ति,सर्वज्ञानोत्तरवृत्ति,,0
[1271] Biber.pm:3976> DEBUG - Sādhanamālā => mm,,साधनमाला,साधनमाला,,0
[1271] Biber.pm:3976> DEBUG - Sārdhatriśatikālottara => mm,,सार्धत्रिशतिकालोत्तर,सार्धत्रिशतिकालोत्तर,,0
[1271] Biber.pm:3976> DEBUG - Siddhayogeśvarīmata => mm,,सिद्धयोगेश्वरीमत,सिद्धयोगेश्वरीमत,,0
[1271] Biber.pm:3976> DEBUG - Siddhaikavīratantra => mm,,सिद्धैकवीरतन्त्र,सिद्धैकवीरतन्त्र,,0
[1271] Biber.pm:3976> DEBUG - Saurasaṃhitā => mm,,सौरसंहिता,सौरसंहिता,,0
[1271] Biber.pm:3976> DEBUG - Svacchandatantra => mm,,स्वच्छन्दतन्त्र,स्वच्छन्दतन्त्र,,0
[1271] Biber.pm:3976> DEBUG - Svāyambhuvapāñcarātra => mm,,स्वायम्भुवपाञ्चरात्र,स्वायम्भुवपाञ्चरात्र,,0
[1271] Biber.pm:3976> DEBUG - Svāyambhuvasūtrasaṅgraha => mm,,स्वायम्भुवसूत्रसङ्ग्रह,स्वायम्भुवसूत्रसङ्ग्रह,,0
[1271] Biber.pm:3976> DEBUG - Harṣacarita => mm,,हर्षचरित,हर्षचरित,,0
[1271] Biber.pm:3976> DEBUG - Hevajratantra => mm,,हेवज्रतन्त्र,हेवज्रतन्त्र,,0
[1271] Biber.pm:3976> DEBUG - Jñānaratnāvalī => mm,,ज्ञानरत्नावली,ज्ञानरत्नावली,,0
do they look OK?
In this debugging output the IAST looks garbled: Diacritics slide off from their respective base letters to the following ones. Just compare the strings with those of the input file, then you'll see it. I haven't checked it in every way, but looking at a few this seems to happen throughout. This does not seem to affect the Devanāgarī side of things, which on a cursory look seems o.k., apart from the sorting issue of ṃ (I would expect aka to be sorted before aṃka etc, but ala and alaṃkāra° seem o.k. again.), ḥ and jñ (I would expect this ligature to be treated as separate letters, not a letter in its own right, probably coming at penultimate (?) position. I would suspect then also the ligature kṣ is treated as one letter by the collation algorithm, and then sorted in the last position, which at least for Sanskrit you would not want normally).
I think you can ignore the IAST, it's copied from the .blg
file, that seems to be an artefact of how Biber writes Unicode to the .blg
. In the .bbl
the IAST is fine. Especially if the Devanāgarī is right I think we can assume that the transliteration works now.
So the only issue left is sorting. I compared Biber's sorting below with various settings in http://anubhav-chattoraj.github.io/indic-tools/devanagari_sorter/
अंक
अंच
अंट
अंत
अंप
अंय
अंर
अंल
अंव
अंश
अंष
अंस
अंह
अःक
अःच
अःट
अःत
अःप
अःय
अःर
अःल
अःव
अःश
अःष
अःस
अःह
अक
अग्निपुराण
अग्निवेश्यगृह्यसूत्र
अच
अट
अत
अथर्ववेदपरिशिष्ट
अप
अभयपद्धति
अमोघपाशकल्पराज
अय
अर
अर्थशास्त्र
अल
अलंकारकारिका
अव
अश
अष
अस
अह
ईशानशिवगुरुदेवपद्धति
ऋग्विधान
कल्याणकामधेनु
किरणतन्त्र
कुट्टनीमत
कुब्जिकामततन्त्र
कृष्णयमारितन्त्रपञ्जिका
गुह्यसमाजतन्त्र
गुह्यसमाजमण्डलविधि
गुह्यसिद्धि
चण्डमहारोषणतन्त्र
चण्डमहारोषणतन्त्रपञ्जिका पद्मावती
छन्दःसंग्रह
छन्दःसार
जयाख्यसंहिता
ज्योतिःसार
तत्त्वरत्नावली
तन्त्रसद्भाव
तन्त्रालोक
दिव्यावदान
देर्गे
नित्यादिसङ्ग्रहाभिधानपद्धति
निश्वासकारिका
निश्वासतत्त्वसंहिता
पराख्यतन्त्र
पारमेश्वरतन्त्र
पूर्व-कामिक
प्रतिष्ठालक्षणसारसमुच्चय
बृहत्संहिता
ब्रह्मयामलतन्त्र
भैरवपद्मावतीकल्प
मञ्जुश्रियमूलकल्प
मतङ्गपारमेश्वरागम
मालिनीविजयोत्तरतन्त्र
मुक्तावली
मृगेन्द्रतन्त्र
रौरवसूत्रसङ्ग्रह
लघुतन्त्रटीका
लघुशंवरतन्त्र
वज्रावली
विमलप्रभा
वीणाशिखतन्त्र
शारदातिलक
शिवतत्त्वरत्नाकर
सम्पुटतन्त्रप्रकरणार्थनिर्णय
सम्पुटोद्भवतन्त्र
सर्वतथागततत्त्वसङ्ग्रह
सर्वतथागताधिष्ठानसत्त्वावलोकनबुद्धक्षेत्रसंदर्शनव्यूह
सर्वज्ञानोत्तरतन्त्र
सर्वज्ञानोत्तरवृत्ति
साधनमाला
सार्धत्रिशतिकालोत्तर
सिद्धयोगेश्वरीमत
सिद्धैकवीरतन्त्र
सौरसंहिता
स्वच्छन्दतन्त्र
स्वायम्भुवपाञ्चरात्र
स्वायम्भुवसूत्रसङ्ग्रह
हर्षचरित
हेवज्रतन्त्र
ज्ञानरत्नावली
I got consistently different results for ज्ञानरत्नावली/ Jñānaratnāvalī (Biber sorts it at the end, the quoted webpage at position 62 between जयाख्यसंहिता/ Jayākhyasaṃhitā and ज्योतिःसार/ Jyotiḥsāra) and सर्वज्ञानोत्तरतन्त्र/ Sarvajñānottaratantra सर्वज्ञानोत्तरवृत्ति/ Sarvajñānottaravṛtti (Biber sorts them after सर्वतथागततत्त्वसङ्ग्रह/ Sarvatathāgatatattvasaṅgraha and सर्वतथागताधिष्ठानसत्त्वावलोकनबुद्धक्षेत्रसंदर्शनव्यूह/ Sarvatathāgatādhiṣṭhānasattvāvalokanabuddhakṣetrasaṃdarśanavyūha the webpage before). So all of this seems to be only about Jñ
Yes, don't worry too much about what is pasted here or what your text editor/terminal displays in the .blg unless you understand how it handles UTF-8 in terms of composed/decomposed form. What matters is the PDF output.
That website you linked gives you the option to sort the jñ as a separate letter, (as well as the kṣ and tr, for which I should add something to the example), activating which didn't make any difference. But this seems to be a sorting convention used by some people.
Here now a new shorter example which confirms that kṣ is also sorted as a separate letter at the end, which, at least for Sanskrit, it should not. tr is sorted at the proper place, so the problem has now boiled down to jñ and kṣ.
The sorting of ṃ and ḥ is o.k. as it is.
\documentclass{article}
\listfiles
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{Latin Modern Roman}
\usepackage{fontspec}
\usepackage{biblatex}
\usepackage{filecontents}
\addbibresource{\jobname.bib}
\begin{filecontents*}{\jobname.bib}
@misc{kumāra,
title = {kumāra},
}
@misc{kṣetra,
title = {kṣetra},
}
@misc{kha,
title = {kha},
}
@misc{jīvita,
title = {jīvita},
}
@misc{jñāna,
title = {jñāna},
}
@misc{jvara,
title = {jvara},
}
@misc{tyāga,
title = {tyāga},
}
@misc{tridaśa,
title = {tridaśa},
}
@misc{tvid,
title = {tvid},
}
\end{filecontents*}
\DeclareSortTranslit{
\translit[title]{iast}{devanagari}
}
\begin{document}
\nocite{*}
\printbibliography
\end{document}
Do you have any source for the complete sorting rules that you would like to see applied?
If I understand correctly Devanāgarī is a script and scripts do not necessarily determine the sorting uniquely language-specific rules have to be taken into account as well. Take for example the different sortings of Ö in Swedish and German.
See also Q16 What about collation of Indic language data? in http://unicode.org/faq/indic.html#16 and http://www.unicode.org/notes/tn1/ (https://www.unicode.org/notes/tn1/Wissink-IndicCollation.pdf), esp. p. 5
As far as I can see, there are currently no alternative tailorings for sanskrit: https://metacpan.org/pod/Unicode::Collate::Locale#A-list-of-tailorable-locales
You might look at the references here to see which UCA sanskrit collation is being used and it is then possible to submit a request to the author of Unicode::Collate::Locale for alternative collations if they are available in the UCA.
With sortlocale=hi
\documentclass{article}
\listfiles
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{Latin Modern Roman}
\usepackage{fontspec}
\usepackage[sortlocale=hi]{biblatex}
\usepackage{filecontents}
\addbibresource{\jobname.bib}
\begin{filecontents*}{\jobname.bib}
@misc{kumāra,
title = {kumāra},
}
@misc{kṣetra,
title = {kṣetra},
}
@misc{kha,
title = {kha},
}
@misc{jīvita,
title = {jīvita},
}
@misc{jñāna,
title = {jñāna},
}
@misc{jvara,
title = {jvara},
}
@misc{tyāga,
title = {tyāga},
}
@misc{tridaśa,
title = {tridaśa},
}
@misc{tvid,
title = {tvid},
}
\end{filecontents*}
\DeclareSortTranslit{
\translit[title]{iast}{devanagari}
}
\begin{document}
\nocite{*}
\printbibliography
\end{document}
gives caakkc.pdf
[1] kumāra. [2] kṣetra. [3] kha. [4] jīvita. [5] jñāna. [6] jvara. [7] tyāga. [8] tridaśa. [9] tvid.
See also Q16 What about collation of Indic language data? in http://unicode.org/faq/indic.html#16 and http://www.unicode.org/notes/tn1/ (https://www.unicode.org/notes/tn1/Wissink-IndicCollation.pdf), esp. p. 5
@plk Would it make sense and be possible to enable the sortlocale
option and possibly other options like sortcase
and sortupper
on a per-refcontext basis? What about the commands of §4.5.6 Sorting?
Well, it already is because sorting
is a refcontext argument and all of those things can be set as part of a sorting template ...
Oh yes, I hadn't seen the locale
option to \DeclareSortingTemplate
, sorry.
Can one do something similar for \DeclareSortExclusion
and friends and \DeclareSortTranslit
? The latter was actually why I'm asking. I guess it would make sense to have IAST-transliterated sources and other sources in different refcontext and I would only want to enable the conversion for the IAST-refcontext and not the normal context.
Hmm, not trivial to do this. These options are inherently global as they are preamble only. Can you see any use-case for this? Such things seem very global ...
I can definitely see a use in restricting \DeclareSortTranslit
. Suppose I have Indic and Latin reference in the same document. I want my Indic references to follow IAST transliteration and then Sanskrit sorting, but naturally my Latin sources should follow the usual Latin sorting.
\documentclass{article}
\listfiles
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{Latin Modern Roman}
\usepackage{fontspec}
\usepackage{biblatex}
\usepackage{filecontents}
\addbibresource{\jobname.bib}
\begin{filecontents*}{\jobname.bib}
@misc{kumāra,
title = {kumāra},
keywords = {indic},
}
@misc{kṣetra,
title = {kṣetra},
keywords = {indic},
}
@misc{kha,
title = {kha},
keywords = {indic},
}
@misc{jīvita,
title = {jīvita},
keywords = {indic},
}
@misc{jñāna,
title = {jñāna},
keywords = {indic},
}
@misc{jvara,
title = {jvara},
keywords = {indic},
}
@misc{tyāga,
title = {tyāga},
keywords = {indic},
}
@misc{tridaśa,
title = {tridaśa},
keywords = {indic},
}
@misc{tvid,
title = {tvid},
keywords = {indic},
}
@misc{aachen,
title = {Aachen},
}
@misc{augsburg,
title = {Augsburg},
}
@misc{arnhem,
title = {Arnhem},
}
@misc{avignon,
title = {Avignon},
}
@misc{aix-en-provence,
title = {Aix-en-Provence},
}
@misc{berlin,
title = {Berlin},
}
@misc{utrecht,
title = {Utrecht},
}
@misc{zeven,
title = {Zeven},
}
\end{filecontents*}
\DeclareSortTranslit{
\translit[title]{iast}{devanagari}
}
\begin{document}
\nocite{*}
\printbibliography[keyword=indic]
\printbibliography[notkeyword=indic]
\end{document}
sorts my Latin sources in their nonsense Devanāgarī form.
From trace
[537] Biber.pm:3976> DEBUG - zeven => mm,,Zएवेन्,Zएवेन्,,0
[537] Biber.pm:3976> DEBUG - aachen => mm,,अअछेन्,अअछेन्,,0
[537] Biber.pm:3976> DEBUG - arnhem => mm,,अर्न्हेम्,अर्न्हेम्,,0
[537] Biber.pm:3976> DEBUG - avignon => mm,,अविग्नोन्,अविग्नोन्,,0
[537] Biber.pm:3976> DEBUG - utrecht => mm,,उत्रेछ्त्,उत्रेछ्त्,,0
[537] Biber.pm:3976> DEBUG - aix-en-provence => mm,,ऐx-एन्-प्रोवेन्चे,ऐx-एन्-प्रोवेन्चे,,0
[537] Biber.pm:3976> DEBUG - augsburg => mm,,औग्स्बुर्ग्,औग्स्बुर्ग्,,0
Right, I see. I don't think a per-refcontext setting will fix this. What about an optional arg that makes transliteration apply only to entries with particular langid
s? I think this is probably the best solution.
Mhhh, yes the example was a bit too sparse on that front. I would could have started a new refcontext for the Latin bibliography and then it would work.
I feel that \DeclareSortTranslit
(and \DeclareSortExclusion
, \DeclareSortInclusion
and \DeclarePresort
) are intimately tied to sorting and since that is essentially per-refcontext I though it natural to have those settings per-refcontext (or per-sorting-template similar to locale
...) as well.
Per-langid
sort translit would certainly solve this problem and I can't really think of a different setting where it would be inferior to per-refcontext translit. I still like the idea of per-refcontext, but since you will have to implement it I'll defer to your judgment.
Please try dev 3.12 and biber dev 2.12. \translit
now has a changed parameter sequence with an optional csv of langids to apply the \translit
to. I think this is the best solution as transliteration applies to languages. It seems to fix your example in my tests and allows for transliterated and non-transliterated sorting in the same reference list.
The example works fine with 3.12/2.12 dev. Thank you very much.
\documentclass{article}
\listfiles
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{Latin Modern Roman}
\usepackage{fontspec}
\usepackage{biblatex}
\usepackage{filecontents}
\addbibresource{\jobname.bib}
\begin{filecontents*}{\jobname.bib}
@misc{kumāra,
title = {kumāra},
keywords = {indic},
langid = {hi},
}
@misc{kṣetra,
title = {kṣetra},
keywords = {indic},
langid = {hi},
}
@misc{kha,
title = {kha},
keywords = {indic},
langid = {hi},
}
@misc{jīvita,
title = {jīvita},
keywords = {indic},
langid = {hi},
}
@misc{jñāna,
title = {jñāna},
keywords = {indic},
langid = {hi},
}
@misc{jvara,
title = {jvara},
keywords = {indic},
langid = {hi},
}
@misc{tyāga,
title = {tyāga},
keywords = {indic},
langid = {hi},
}
@misc{tridaśa,
title = {tridaśa},
keywords = {indic},
langid = {hi},
}
@misc{tvid,
title = {tvid},
keywords = {indic},
langid = {hi},
}
@misc{aachen,
title = {Aachen},
}
@misc{augsburg,
title = {Augsburg},
}
@misc{arnhem,
title = {Arnhem},
}
@misc{avignon,
title = {Avignon},
}
@misc{aix-en-provence,
title = {Aix-en-Provence},
}
@misc{berlin,
title = {Berlin},
}
@misc{utrecht,
title = {Utrecht},
}
@misc{zeven,
title = {Zeven},
}
\end{filecontents*}
\DeclareSortTranslit{
\translit[hindi]{*}{iast}{devanagari}
}
\begin{document}
\nocite{*}
\printbibliography[keyword=indic]
\printbibliography[notkeyword=indic]
\end{document}
You may want to change the example for \DeclareSortTranslit
in the manual to use the new syntax.
Technically, you are right about the other three global sorting macros. However, I would rather wait and see if anyone really needs this and can give a convincing example. I think it's not that likely that anyone needs to vary these things within a document as this would mean that some settings used in one part of the document explicitly did not work with other parts. However, these settings are fairly general and would usually apply generally.
Fair enough.
Do you want me to fix up the example for \DeclareSortTranslit
in the docs to use the new syntax or will you do that?
It's done, just have to push it.
I had some months back installed the development version of biblatex into my ~/texmf/ tree, now I want to make my project portable, is it enough to keep biblatex.sty in the project directory, and use biber 2.12, or do I need any other files as well from the development version?
@ppasedach The dev version is in flow and so I don't know which exact version you got. Assuming that everything works fine so far you are probably good with only your version of biblatex.sty
. The changes in other files is negligible for most intents and purposes at the moment, so even if you pulled the versions now, chances are that you would be OK with biblatex.sty
and Biber only. But there is no guarantee and lots of things depend on which features you use.
That all said, I can not recommend using the dev versions for production work. And I strongly recommend not disseminating the development versions to other people (I'm not sure if that is what you ultimately have in mind when you want to make your project portable).
The project is a book, being developed in a private repository, so no dissemination of biblatex's and biber's development versions to others apart from one more collaborator who hardly touches the LaTeX sources. The point of my question was just about being able to quickly move my book project to some other computer without needing to modify the TeX Live installation there. I am still using the development version from August, but could also update to a newer version, if that's advisable, but of course I also understand the point about better not using dev versions for production work. Or, has the bug fix been incorporated into the stable versions, or could that be done without much effort? Then of course I'd prefer to use the stable versions.
You should be fine with just biblatex.sty
and Biber.
If things work for you on your current machines and on the other target machines as well, there is no need to get a newer development version. Of course that only holds if you can use your version of Biber and biblatex.sty
on the other machines. If you are switching between Windows and Linux for example you need to make sure that your development versions of Biber are from the same snapshot (or at least compatible), which would probably mean that you would have to pull all involved binaries anew now; you would then also need a the current dev biblatex.sty
.
There has not been an update to the release versions of either Biber or biblatex
since February (v3.11/v2.11). That means that this bug fix is not available in the stable versions yet. Since this fix involves a new Biber version it can't be deployed as easily since that requires that the binaries be built (and not only for the three standard systems, but also for a few others), which usually takes some time and involves more people than just PLK. Given the recent developments I'd say we should push out a new version soon, but there are still a few things that would have to be taken care of before we can release it (no ETA for any of that at the moment, though).
(Already noted here)
IAST-transliterated Sanskrit does not sort correctly any more, it appears as if somewhere diacritics are stripped off or something else happens with them, a and ā, m and ṃ, h and ḥ, ś, ṣ and s, t and ṭ are messed up in the example, I would assume it happens to all diacritical combinations.
test_long.pdf test_long.tex.gz