If you use these data please cite
Grierson, George Abraham (1928): Linguistic Survey of India. Comparative Vocabulary. Calcutta: Government of India Central Publication Branch.
This dataset is licensed under a CC-BY-4.0 license
Available online at https://lsi.clld.org
Conceptlists in Concepticon:
The first pass on the digitization was done by Patrick Lundberg and Taraka Rama, who typed the text from the tables in the scanned book pages into text files in a format easy to parse computationally. From this dataset, available in raw/LSI_txt, the data was then parsed and successively converted to CLDF, adding orthography profiles, providing links of language names to Glottolog, and linking the concept list to Concepticon (see Grierson-1928-168).
The following considerations went into creating the orthography profiles:
Grapheme | IPA | Comment |
---|---|---|
V | v | typically common in Indian languages. Alternates between v and w |
^A | ɑ | a in America or u in hurry |
a | ə | a in America or u in hurry |
à͛ | à͛/a | Occurs twice in Pwo-Bassein. No explanation |
ā | aː | a in America or u in hurry |
ǟ | ǟ/æ | lengthened ɛ. ɛː |
ǎ̀ | ǎ̀/a | occurs in Katurr Palaung. A short version of a. ă |
ạ̄ | ạ̄/aː | Palaung. u in but (ʌː) |
ạ̌ | ạ̌/a | Only occurs in Syrian Gypsy) |
ḅ | ḅ/b | A peculiar labial according to Grierson, unvoiced may be) |
ḇ | ḇ/b | Another variety of sound. Occurs in Tailang) |
c̣ | c | No mention in the book. Based on context, treat it as ch ~ tɕ) |
ḥ̣ | h | A sound equivalent to visarga in Sanskrit. Essentially h) |
ī̃° | ī̃°/ĩ | actually a glottal check) |
ī̇ | ɪː | Only occurs once in Mandarin) |
ï̌ | ï̌/ɪ | Centralized vowel (may be) occurring once in Prakrit) |
ị̄ | ị̄/ɪː | Occurs in Palaung. Supposed to be a modification of ī) |
i̯ | i̯/j | Occurs only in Cham. no explanation given. Is it non-syllabic?) |
ḷ’ | ɭ̥ | supposed to be a breathy voiced ɭ) |
m̊° | m̥ | mˤ (Should be a glottal check according the book. ˤ) |
m̌ | m̌/m | Occurs in Singhalese. No description given in Grierson) |
n | n̪ | not clear if this should be a dental sound. Tamil has an alveolar stop. In general dental nasal stops are present in Indian languages) |
ṅ̇ | ṅ̇/n | Typo in the data. Should be treated as velar nasal ŋ |
r | r/ɾ | possibly a flap for Tamil/malayalam. Rest of languages, it could be r. No explanation in the book. |
ṛ’ | ɽ̊ | weak aspiration |
r̤ | r̤/r ɻ | retroflex approximant occurs in Malayalam and Tamil |
ṟˡ | ṟˡ/r | trilled r |
s̄ | s̄/s | Typo in case of Anal, Bhojpuri |
š́ | š́/ʃ | skh in Ormuri |
ṣ̌ | ṣ̌/ʂ | sch in Ormuri |
s̱ | s̱/s | part of ṯs̱ |
t̤ | t̤/t | tˤfor ط Arabic. |
ū’ | ū’/uː | Only occurs in Sakai and Semang. SHould be treated as "uː h" |
ǖ | ǖ/yː | long variant of ü (y) |
v | v | ʋ typically common in Indian languages. Alternates between v and w |
à | à/a | as in German Mann |
è | è/e | no explanation in the book. Better go with e |
é | é/e | no explanation in the book. Better go with e |
ì | ì/i | no explanation in the book. Better go with i |
í | í/i | no explanation in the book. Better go with i. Three instances |
ï | ï/ɪ | a centralized vowel |
ò | ò/o | Typo for ö in Yeinbå. |
ó | ó/o | Occurs in Rong/Lepcha. Equivalent to o in "for" or "nor" |
ô | ɔ | no sound in the original transcriptions. Occurs in the language name: Salôn |
õ | õ | nasalized. No explanation but can assume... |
ö̌ | ö̌/œ | (̈̌ü dipthong. A very short French eu followed by u. Found in Miao-Hmong |
ù | ù/u | no explanation in the book. Better go with u |
ú | ú/u | no explanation in the book. Better go with u |
ü | y | y: is for German ubel |
ė | ė/ə | No explanation. Only found once in Annamese (Vietnamese) |
ě | ě/e | equivalent to ə. occurs in Katurr Palaung |
ň | ň/n | no explanation in Grierson |
ũ | ũ | probably nasalized vowel |
ż | ż/z | no explanation in Grierson |
ǎ | æ | parsing error. It is part of the ǎ̀ symbol |
Ǐ | i | no explanation in Grierson |
ǐ | ǐ/i | no explanation in Grierson |
ǒ | ɒ | no explanation in Grierson |
ǔ | ǔ/ʊ | short version of oo in soon, boon. ŏ |
ǚ | ǚ/y̆ | extra short y |
ȧ | ȧ/a | It should be å. It is not clearly printed in the original book |
ȯ | o | Only occurs once in Shodochi for "ten". |
ȳ | yː | not there in the book |
ɯ | ɯ | Book shows this form |
δ̤ | ðʰ | ðˤ version. Ẓāʾ in Modern Standard Arabic |
δ̱ | d̪ | ˤ version of d̪ |
ḣ | ḣ/h | Typo. should be ḥ |
ḥ | ḥ/h | A sound equivalent to visarga in Sanskrit. Essentially h |
ḳ | ḳ/k | Occurs only in Salon. No explanation in the book |
ṁ | ṁ/m | as a nasal vowel. typically nasalizes previous vowel and occurs in Sanskrit and borrowings |
ṃ | ṃ/ṁ | Typo. should be rendered as ṁ |
ṙ | ṙ/r | typo. should be rendered as ṛ |
ṟ | ṟ/r | a trilled r |
ṡ | ṡ/s | better shown as ʃ |
ṫ | ṫ̪ | Could be a parsing error. Can't locate it |
ạ | ạ/a | Less rounded ö. Occurs in SHAN. A slightly long variant occurs in Siamese. ø̜ |
ạ̄ | ạ̄/aː | ø̜ː in Siamese |
ẹ | ẹ/ɚ | Gheko has this sound mainly. variant between i and e. So ɪ is a good candidate. Occurs in Avestan but no explanation. |
ị | i | Occurs in Palaung. Supposed to be a modification of i |
ụ | ụ/u | Less rounded ü. May be y̜. Occurs in Siamese and Shan |
ꭓʷ | χʷ | xʷ labialized x |
ꭓ́ | χ | kkh according to the book |
^ō̂ | ō̂/oː | One occurrence in Guzuri of Hazara |
LSI covers more than 350 language varieties from multiple language families.
See cldf/README.md for a description of the tables and columns and the entity-relationship diagram for how they relate.
Name | GitHub user | Role |
---|---|---|
Grierson, George Abraham | Author | |
Taraka Rama | @phylostar | Editor |
Patrick Lundberg | DataCurator | |
Christoph Rzymski | @chrzyki | DataCurator |
Robert Forkel | @xrotwang | Editor |
Johann-Mattis List | @lingulist | Editor |
The following CLDF datasets are available in cldf: