Build BOW model for Semantic Domain Identification

Goal

We want to build a ~~GNN-based edge prediction~~ BOW model for SDI. We hypothesize that it has a higher performance than the simple baseline model. Motivation: SDI with F1 > 0.30 for 1 tpi/meu

Tasks

[ ] Acquire refined mappings from verses to semantic domains #1
[ ] use refined mappings from words in verses to SDs to assign SDs to words in verses from LRL
- simply assign SDs in eng to each aligned word in LRL
- if many false positive mappings (i.e., low precision): refine assignments with generated SD dicts for LRL (set intersection)
[ ] collect BOW for every word with assigned SD (2 words before and after word in the middle)
[ ] aggregate BOWs by SD
[ ] perform SDI by extracting BOW for every candidate word in input sentence and compute cosine dist to aggregated BOW
[ ] try out baseline: look up each word in a dictionary
[ ] consider usefulness of WSD (word sense disambiguation) with pywsd or different tool: Eng verse → WordNet → SD (see Jonathan’s 2nd mail)

janetzki / GUIDE

Build BOW model for Semantic Domain Identification #30

Goal

Tasks