sillsdev / silnlp

A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
Other
35 stars 3 forks source link

#513 multiple diverse drafts #533

Closed benjaminking closed 1 month ago

benjaminking commented 1 month ago

This adds the ability for SILNLP's translate functionality to produce multiple diverse drafts. This ability is controlled by the --multiple-translations parameter on the command line (for both translate.py and experiment.py). The other parameters, such as number of drafts and the method of creating multiple translations, are specified in the config file.

When multiple translations are requested, the output file name is given a different extension for each draft. For example, if you specify "output.txt" as the output file and request 3 drafts, then three files will be created called "output.1.txt", "output.2.txt", and "output.3.txt".

The default method for producing multiple translations is called "hybrid" and uses a combination of beam search and sampling. Other supported values are "sampling" which creates each draft with random sampling, "beam_search" which uses vanilla beam search (using the top-n hypotheses to populate the n drafts), and "diverse_beam_search" which uses the method in Vijayakumar, et al., 2016. Other new parameters controlling the creation of multiple drafts are "temperature" and "diversity_penalty". I will plan to document all of this in the wiki.


This change is Reviewable

benjaminking commented 1 month ago

I have addressed these issues with the latest commit. I reverted "num_beams" back to 2. My evaluations had found that a value of 5 led to higher BLEU scores, but I will leave it up to the user to make that change.