As a developer, I want to use books from the bloom library as supplementary training data to improve the word alignment's quality. This would successively increase the dictionary creator's precision.
Motivation: (More data beats more clever algorithms.) more parallel data -> improve alignment -> less FPs -> higher DC precision
Example
The Story of Jonah
eng: In those days there was a very large town where many people lived. The town's name was Nineveh.
tpi: Long dispela taim i gat wanpela bikpela taun i gat planti manmeri. Nem bilong dispela taun em Nineveh.
(https://huggingface.co/datasets/sil-ai/bloom-lm/viewer/tpi/train)
Goal
As a developer, I want to use books from the bloom library as supplementary training data to improve the word alignment's quality. This would successively increase the dictionary creator's precision. Motivation: (More data beats more clever algorithms.) more parallel data -> improve alignment -> less FPs -> higher DC precision
Example
The Story of Jonah
eng:In those days there was a very large town where many people lived. The town's name was Nineveh.
tpi:Long dispela taim i gat wanpela bikpela taun i gat planti manmeri. Nem bilong dispela taun em Nineveh.
Tasks
align these texts