karlb / wikdict-gen

Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project
http://www.wikdict.com
MIT License
43 stars 4 forks source link

Missing translations that are not assigned to a sense #20

Closed karlb closed 8 months ago

karlb commented 8 months ago

Currently, the assumption is that for each Wiktionary, translations are either:

Unfortunately, this does not allow handling of all translations. In the Spanish Wiktionary, translations are in a single section for the lexentry (so no directly assigned to a sense), but often contain a numeric reference (e.g. "[2]") to identify the sense. dbnary is smart enough to parse these and assign the translation to the sense in that case.

However, not all translations have these numeric sense references and therefore stay linked to the lexentry. These translations are currently lost to WikDict. It would make sense to include these translations with an empty sense/gloss.

Example: https://es.wiktionary.org/wiki/monje

zcat ttl/es_dbnary_*.ttl.gz | awk 'BEGIN {RS=""} /_tr_.*monje/ {print "\n"$0}'

spa:__tr_deu_1_monje__sustantivo_masculino__1
        rdf:type                dbnary:Translation;
        dbnary:isTranslationOf  spa:monje__sustantivo_masculino__1;
        dbnary:targetLanguage   lexvo:deu;
        dbnary:writtenForm      "Mönch"@de .

spa:__tr_bre_1_monje__sustantivo_masculino__1
        rdf:type                dbnary:Translation;
        dbnary:isTranslationOf  spa:monje__sustantivo_masculino__1;
        dbnary:targetLanguage   lexvo:bre;
        dbnary:writtenForm      "manac'h"@br .

spa:__tr_eng_1_monje__sustantivo_masculino__1
        rdf:type                dbnary:Translation;
        dbnary:isTranslationOf  spa:monje__sustantivo_masculino__1;
        dbnary:targetLanguage   lexvo:eng;
        dbnary:writtenForm      "monk"@en .