Closed johnml1135 closed 12 months ago
This may require running many, many alignments - and therefore probably should have two things done to improve SMT performance:
This issue is just for the research to see if it is valuable. If it is, we can walk through the steps of implementation.
As this paper proposes, we may be able to setup multiple languages for one model. Instead of going through a lot of work to get the best single source and training 5-10 models, we may be able to choose 3-5 best aligned translations and just train off of all of them. Then, we can try each as a source when checking out which source works best. That should dramatically reduce GPU training time.
These ideas are moved here: https://docs.google.com/document/d/1SXWLj6FY89cowQJVO-XY6q5BpDNaHmiesYPQ_Wxo5q4/edit. They will be considered as the onboarding flow is worked out.
A key selection for translating is the selection of the source bible text. The best source scripture may not be the one primarily being used as the basis for manual translation. The "best" may be related to:
To choose the best source text optimize the translator's experience, and to minimize the computational and translator burden of decision, I propose the following:
Now, let the user determine the best of the 3-5 by: