Doculect entries are stored as Python INI files, named with the Glottocode of the language followed by a hyphen and an index number: 1
for the first entry for that Glottocode, 2
for the second, etc.
Entry files have five headers: core
, source
, (optionally) notes
, phonemes
, and allophonic_rules
. An optional todo
section is also permitted.
core
core
stores two required attributes:
name
: the name of the doculect as given in the sourceglottocode
: the Glottocode of the languageAnd two optional attributes:
dialect
: the Glottocode of the specific dialect, if one is defineddialect_name
: the name of the specific dialect as given in the source, if a specific dialect is referencedsource
source
stores many attributes, of which the most common ones are:
glottolog
url
doi
author
title
publisher
volume
number
year
Enough information should be given that the paper can be found. At the minimum, a Glottolog ID should be provided if one is available; other information can then be added automatically from Glottolog when a numbered release of the Index is built.
notes
notes
stores notes relevant to the doculect entry.
phonotactics
phonotactics
stores information about the language's syllable structure; currently this contains only two fields, max_initial
and max_final
. If a doculect's source does not provide the necessary information, the special value no_info
may be stored in this field to reflect this.
phonemes
phonemes
stores a set of phonemes, separated by newlines.
To mark a phoneme as marginal, enclose it in parentheses.
To mark a phoneme as only occurring in non-nativized loans, enclose it in curly brackets.
To mark a phoneme as marginal outside non-nativized loans, enclose it in parentheses and curly brackets.
In some cases, phonemes may be too underspecified or under-described to be easily reducible to one IPA representation, as with the Rotokas voiced series, or coronal plosives that may be either dental or alveolar. Indicate these cases by listing the candidate representations separated by vertical bars, with the canonical representation used by the source document in the first position.
allophonic_rules
allophonic_rules
stores a set of allophonic rules, written in source > realization / environment
format.
phonemes
.In cases where an entire cluster or sequence has a specific realization, such as English /nð/ > [n̪ː], join the source phonemes in the sequence with a plus sign: n+ð > n̪ː
. If this rule has no conditioning factor outside the cluster itself, the / environment
component may be omitted.
For cases of free variation, such as Nuosu m+ɨ >~ m̩
, use the digraph >~
. For cases of free variation among obligatory conditioned allophones, such as t > s ~ ts / _i
in Rotokas, use >
and separate the variants with ~
.
The frication diacritic is carried over from PHOIBLE: for example, the voiced velar lateral fricative is ʟ͓
.
The retroflex lateral flap is written ɺ̢
.
The IPA palatal series is here interpreted as velar palatals; coronal palatals are represented by the Sinological ȶ
series.
Affricates and consonants with bilabially trilled release are assumed to agree in voice unless otherwise specified.
Prenasalized consonants are written with preceding superscript n: for example, ⁿp
instead of mp
or m̥p
. Postnasalized consonants or prestopped nasals are written as digraphs: pm̥
or pm
(depending on whether the nasal element is voiced) instead of pⁿ
.
Fricated or 'super-close' vowels such as Mandarin -i are written with extensions of the Sinological characters:
ɿ
instead of z̩
ɿᶾ
instead of ʒ̩
ʅ
instead of ʐ̩
ɿᶽ
instead of ʑ̩
ꭒ
instead of v̩
There may eventually be a ʮ
series also, but we haven't needed one yet.
The retraction diacritic on vowels is used in the Tibeto-Burman manner, to represent the 'tight throat' quality or 'tense voice' that appears in Liangshan Yi and Bai. Uvularization (which appears in some Qiangic languages) is transcribed with a following ʶ
.
ʵ
replaces ˞
as a marker of rhoticity.
Tone is written with Chao tone letters. The super-high 66 tone of Bai is written ˥́.
Inventories of 'eroded' Sino-Tibetan languages are typically given as onsets, rimes, and tones. We convert these to inventories of consonants and vowels, and err on the side of segmental simplicity, although complex rimes may be represented as unit segments in certain cases where we can identify good reason to do so.
The non-syllabicity diacritic is used on diphthongs when:
If diphthongs that are not closing or close-to-close are present and the source does not use the non-syllabicity diacritic, it is not used.
For example, if a source lists a diphthong inventory of /ai au ei eu oi ou iu/, these diphthongs will be input as /ai̯ au̯ ei̯ eu̯ oi̯ ou̯ iu̯/. But if a source lists /ai au ea oa/, these will be input as /ai au ea oa/, since it isn't clear whether /ea oa/ are falling or rising in prominence.
SIL Organized Phonology Data sheets almost always list the low vowel as /ɑ/ rather than /a/. In these cases, /a/ will be input unless the low vowel is clearly described to be backed.
An example file, roto1249-1.ini
, is given below.
[core]
name = Rotokas
glottocode = roto1249
[source]
author = Firchow, Irwin; Firchow, Jacqueline
title = An Abbreviated Phoneme Inventory
publisher = Anthropological Linguistics
volume = 11
number = 9
year = 1969
pages = 271-276
glottolog = 110896
url = https://www.jstor.org/stable/30029468
[phonotactics]
max_initial = 1
max_final = 0
[phonemes]
p
t
k
β|b|m
ɾ|n|l|d
g|ɣ|ŋ
a
e|ɛ
o
i|ɪ
u
aː
eː|ɛː
oː
iː|ɪː
uː
[allophonic_rules]
t > s ~ ts / _i
[core]
name = REQUIRED
glottocode = REQUIRED
notes = OPTIONAL
dialect = OPTIONAL
[source]
glottolog = IDEAL
url = IDEAL
author = OPTIONAL (but REQUIRED if there's no glottolog ID)
title = OPTIONAL (but REQUIRED if there's no glottolog ID)
publisher = OPTIONAL
volume = OPTIONAL
number = OPTIONAL
year = OPTIONAL (but REQUIRED if there's no glottolog ID)
pages = OPTIONAL
[phonotactics]
max_initial = REQUIRED IF no_info IS NOT PRESENT
max_final = REQUIRED IF no_info IS NOT PRESENT
[phonemes]
REQUIRED
[allophonic_rules]
PHONEME > IPA_REALIZATION / DESCRIPTION_OF_ENVIRONMENT
PHONEME+PHONEME > REALIZATION_OF_CLUSTER / DESCRIPTION_OF_ENVIRONMENT
add.py
Creates a blank doculect file. Usage: >python add.py <glottocode>
. Prints the name of the created file. Notable options:
-h
: Display a help message listing all script options (can be used without providing a glottocode)-b <bibkey>
: Convenience option for inputting the Glottolog bibkey of the source; this will auto-fill as many of the source fields as possible, but requires a local copy of the Glottolog database and the installation of the pyglottolog library.-n <name>
: Convenience option for inputting the name of the doculect as given in the source--simple
: Omits unfilled optional keys and default textFor example, add.py roto1249 -b sil16:10670 -n Rotokas --simple
.
commit.py
Validates and git add
s a provided entry in .ini format; for example, commit.py roto1249-1
.
stats.py
Prints statistics about the local database. Run stats.py help
to list available reports.