tatuylonen / wiktextract

Wiktionary dump file parser and multilingual data extractor
Other
741 stars 82 forks source link

Parse Template:+obj and add new `coincidence` field #674

Closed kristian-clausal closed 3 weeks ago

kristian-clausal commented 3 weeks ago

The Template:+obj is used to generate formatting for constructions such as [+accusative] or [+foo = a meaning with bar], used to show that something is usually coincident or tied to an argument or a specific word or something else.

These were not previously handled in parsing, which poisoned forms with 'canonical' entries like pro [+accusative].

Instead of creating new categories of tags like with-accusative etc., Tatu gave the green light to make a completely new field, which is here called "coincidence" because it's about coincident... things.

coincidence is a dict with a tags field, words field and meaning field (each optional).

Currently only used at the main entry level for when a word head uses +obj; see pro/Czech.

kristian-clausal commented 3 weeks ago

After discussing with Tatu, instead of doing a specific kludge for Template:+obj, we're going to create a new "extras" or "info_templates" field that is similar to etymology_templates, and use it in root data and sense data.