sillsdev / silnlp

A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
Other
35 stars 3 forks source link

Research methods of adding tags during training and using them during translation. #423

Open davidbaines opened 5 months ago

davidbaines commented 5 months ago

We would like to tag different genres of Scripture texts during training, and include tags with the source text during inferencing. The hope is that this will improve the drafts produced by the model.

We would need a flexible method of tagging verses and including the tags as tokens. Here are a few ideas we could test. They are listed here to give an idea of the kinds of tagging support that might be useful.

Tag each verse with the book it is from. Tag each verse with the name of the author. Tag each verse with a genre. Tag each verse with the book, author, and genre. Tag verses with a language family or dialect.

Tags should be optional, training and inferencing should continue for untagged verses.

davidbaines commented 1 week ago

When we tag verses as verses, names as names and key terms as key terms we can then request translation of verses as verses. This should help to avoid the reduction in scores that we've seen after adding names to the training data.