sillsdev / serval

A REST API for natural language processing services
MIT License
4 stars 0 forks source link

Research: Auto-selection of best Source Bible #65

Closed johnml1135 closed 12 months ago

johnml1135 commented 1 year ago

A key selection for translating is the selection of the source bible text. The best source scripture may not be the one primarily being used as the basis for manual translation. The "best" may be related to:

To choose the best source text optimize the translator's experience, and to minimize the computational and translator burden of decision, I propose the following:

Now, let the user determine the best of the 3-5 by:

johnml1135 commented 1 year ago

This may require running many, many alignments - and therefore probably should have two things done to improve SMT performance:

johnml1135 commented 1 year ago

This issue is just for the research to see if it is valuable. If it is, we can walk through the steps of implementation.

johnml1135 commented 1 year ago

As this paper proposes, we may be able to setup multiple languages for one model. Instead of going through a lot of work to get the best single source and training 5-10 models, we may be able to choose 3-5 best aligned translations and just train off of all of them. Then, we can try each as a source when checking out which source works best. That should dramatically reduce GPU training time.

johnml1135 commented 12 months ago

These ideas are moved here: https://docs.google.com/document/d/1SXWLj6FY89cowQJVO-XY6q5BpDNaHmiesYPQ_Wxo5q4/edit. They will be considered as the onboarding flow is worked out.