Closed johnml1135 closed 2 years ago
First steps:
Audience 1:
Another update from talking with @jonathanrobie:
Use this function as a basis: https://github.com/sillsdev/silnlp/blob/dfdb45fe44a0ff625cc153077291671a8c8c8445/silnlp/alignment/utils.py#L73
scores.alignment.txt (new file - scores) sym-align.txt (alignment)
The data is also here: S:\MT\experiments\de-to-en-WMT2020+Bibles_AE\abp-en
As a point of reference for how long it should take to align a single translation, I was able to align a translation with ~13000 verses in ~35s on my machine. My machine has an Intel i7-9700KF with 8 cores.
I have a fairly old machine: Intel(R) Core(TM) i7-4800MQ CPU and takes 15 minutes if I don't multithread (from start to alignments complete). I wonder if there is a switch for multithreading or not - with those two things it should explain the whole difference (about 4x for newer processor, 8x for multicore).
Need to do:
Fast align documentation: http://mt-class.org/jhu/slides/lecture-ibm-model1.pdf
Bibles:
Use HMM because it should be better with different typologies. Do Hebrew and Greek Add to Google Drive Partnership .../Data/Alignments
Follow up:
Extract keyterms from paratext projects - compare it to the translation alignment model Greek and Hebrew Lemma surface forms If wanting max quality, how about use a different pivot - Septuagint? NASB? Versification sniffing:
This work is closing for the time being. Priorities are shifting and there is no present use for it. Flagging potential versification errors ended up being much easier than using this model.
This issue is to track the progress to making generally available 100's if not 1000's of Bible translations with machine alignments. This has a few audiences:
Brainstorming implementation: