simsapa / simsapa-dictionary

Simsapa Dictionary Tool
MIT License
16 stars 0 forks source link

Dictionary Word Representation #1

Closed gambhiro closed 1 year ago

gambhiro commented 4 years ago

NOTE: The following is for discussion. The current implementation differs from this.

This refers to Markdown and TOML content in the dictionary-wiki, but parsing the data is implemented in this project.

TOML blocks are a data representation in a text format which is intended to be easy to read and write for humans, and can be parsed by programs.

Simple entry

This form allows us to add short entries, which still minimally function for word - definition lookup.

The definition bodies could be long, such as when imported from other dictionaries, which people are able to read and understand from the text, with at least some information captured in the metadata to be used by the program.

A simple Markdown page may be a title, some basic metadata in a TOML block, the rest of the file being the definition text of the entry, which may be a short summary or a longer description.

# upagacchati

``` toml
word = "upagacchati"
dict_label = "PTS"

[[meanings]]
summary = "to come to, go to, approach, flow to (of water)"

[[meanings]]
summary = "to undergo, go (in) to, to begin, undertake"
```

1. to come to, go to, approach, flow to (of water) DN.ii.12; Pv\-a.12 (vasanaṭṭhānaṃ), Pv\-a.29, Pv\-a.32 (vāsaṃ) Pv\-a.132; ger *\-gantvā* Pv\-a.70 (attano santikaṃ), & *\-gamma* SN.ii.17, SN.ii.20.
2. to undergo, go (in) to, to begin, undertake Snp.152 (diṭṭhiṃ anupagamma); Ja.i.106 (vassaṃ); Pv\-a.42 (id.); Ja.i.200; niddaṃ upagacchati to drop off into sleep Pv\-a.43 (aor. upagacchi MSS. ˚gañchi), Pv\-a.105, Pv\-a.128

pp. of *[upagata](upagata.md)* (q.v.).

upa + gacchati

Complex entry

The more metadata is captured in TOML, the better it may be used by the program for dictionary lookup and database processing.

The definition body may still contain long descriptions, but information such as grammatical construction and example sentences may be captured in the metadata and used in other parts of the program.

The TOML block may contain the following:

word = "upagacchati" # The dictionary word lookup entry.
word_nom_sg = "" # The nominative singular form (if applies)
dict_label = "PTS" # A label to distinguish dictionary sources or authors.
inflections = [] # Inflected or conjugated forms such as plurals, which should return this word entry.
phonetic = "" # Phonetic spelling, such as IPA.
transliteration = "" # Transliteration to Latin from other alphabets such as Thai or Chinese.

# First meaning.
[[meanings]]
# Short translation in English.
summary = "to come to, go to, approach, flow to (of water)"

synonyms = [] # Different words with similar meaning.
antonyms = [] # Opposite meanings.
variants = [] # Similar form or construction but different meaning.
also_written_as = [] # Spelling variations.
see_also = ["upagata"] # Related terms.

example_count = 2 # A helper number to mark the number of examples collected for this meaning.

# Grammar of first meaning.
[[meanings.grammar]]
pali_roots = ["upa", "gam"]
pali_root_groups = ["upa", "gam"]
pali_root_group = "1.1"
pali_root_sign = "a"
prefix_and_root = "ā bhuj" # e.g. for ābhujati

construction = "upa + gaccha + ti"
base_construction = "gam + a = gaccha" # Root and conjugation sign

compound_type = ""
compound_construction = ""

sanskrit_word = ""
sanskrit_roots = []

comment = "pp. of upagata" # General grammar comment.
speech = "verb" # Part of speech.
case = "acc." # Specific grammar properties.
num = ""
gender = ""
person = ""
voice = ""
object = ""
transitive = "trans." # trans. / intrans. / ditrans. / empty
negative = "" # true / false / empty
verb = "" # causative / passive / denominate / intensive / empty

# First example of first meaning.
[[meanings.examples]]
sutta_ref = ""
sutta_title = "paṭhama dārukkhandhopamasuttaṃ"
text_pali = "evam'eva kho, bhikkhave, sace tumhe'pi na orimaṃ tīraṃ upagacchatha, na pārimaṃ tīraṃ upagacchatha..."
text_english = ""

# Second example of first meaning.
[[meanings.examples]]
sutta_ref = "..."

# Second meaning.
[[meanings]]
summary = "to undergo, go (in) to, to begin, undertake"

# First example of second meaning.
[[meanings.examples]]
sutta_ref = "SN 12.61"
sutta_title = "assutavāsuttaṃ"
text_pali = "varaṃ bhikkhave assutavā puthujjano imaṃ cātumahābhūtikaṃ kāyaṃ attato upagaccheyya na tv'eva cittaṃ"
text_english = ""

Spreadsheet columns

If you are using a spreadsheet to collect words, use the following headers which can be parsed back to the above TOML.

If you have more than one meaning for a word, add a new row and use the meaning_order number (1, 2, etc.).

word
word_nom_sg
dict_label
inflections
phonetic
transliteration

definition_md

meaning_order

summary
synonyms
antonyms
variants
also_written_as
see_also

example_count

gr_pali_roots
gr_pali_root_groups
gr_pali_root_group
gr_pali_root_sign
gr_prefix_and_root

gr_construction
gr_base_construction
gr_compound_type
gr_compound_construction
gr_sanskrit_word
gr_sanskrit_roots
gr_comment
gr_speech
gr_case
gr_num
gr_gender
gr_person
gr_voice
gr_object
gr_transitive
gr_negative
gr_verb

ex_1_sutta_ref
ex_1_sutta_title
ex_1_text_pali
ex_1_text_english

ex_2_sutta_ref
ex_2_sutta_title
ex_2_text_pali
ex_2_text_english
buddhiko1 commented 2 years ago

Hello~

I have made three colored and styled mdict of pali dictionany。 If you are interested, you can download it here

Thank you for the inspiration~