Experiment with the decoder sizes

gregtatum commented 1 day ago

In Ludicrously Fast Neural Machine Translation, they test a variety of decoder configurations for faster models.

Screenshot of Table 3: Configuration of student models and submissions

In #174 @eu9ene showed that a larger decoder helps improve the COMET score for en-ru by +2.9, which is pretty significant.

(Edit: I changed from en-ru to en-lt)

I'd like to test the parameters a bit more, as these changes are impactful in terms of quality, but also affect the performance of the model. The paper tested parameters on en-de, but our training of en-ru has struggled to gain the same amount of COMET with the same architecture. Rather than testing en-ru, I'll do a clean run on en-lt as it had a pretty low COMET score, and also features a much more varied morphology for the language due to its declension system. The idea is that the results will scale to other Balto-Slavic languages.

I'm shortening the labels in the table a bit:

dec-depth: depth
dim-emb: emb
transformer-dim-ffn: ffn

COMET	depth	emb	ffn	Name
85.11	`2`	`256`	`1536`	decoder-tiny
	`2`	`512`	`2048`	decoder-base
	`3`	`256`	`1536`	decoder-depth-3
	`6`	`256`	`1536`	decoder-depth-6
	`2`	`256`	`2048`	decoder-ffn-bigger
	`2`	`512`	`1536`	decoder-emb-bigger

Links

spring-2024 en-lt task group

eu9ene commented 1 day ago

I suggest using a different language pair for this experiment. en-ru was trained from a super convoluted branch "release_no_priors" where I had to change the graph by adding an extra step for alignments to do some bug fixes and not retrain everything from scratch. It's far behind main and doesn't have the latest W&B fixes, so I don't want to run any more experiments from the "release" based branches. If we switch to main, the graph will not be compatible, so we'll have to at least rerun the alignments step and reuse some other tasks using "existing_tasks". With all that it's a lot easier to run some other language pair we struggle with from main where we can reuse the tasks from release, for example, en-lt.

On another note, this looks like a hyperparameter search that we can do manually, but there are tools to automate it that we might explore in the future.

gregtatum commented 1 day ago

Ok, en-lt sounds like a great choice. I read a bit more on it and it's got a lot of qualitative feedback in #756.

gregtatum commented 1 day ago

Lithuanian has a similar use of declensions: https://en.wikipedia.org/wiki/Lithuanian_declension

mozilla / firefox-translations-training

Experiment with the decoder sizes #894

Links