Process alignments in chunks

Processing in 100M chunks helps to prevent OOM and allows using a smaller machine. It also makes it scalable to use with higher resource languages.

It completed successfully for the en-uk language pair that failed before: log.

We can also potentially implement a mechanism with starting from the last unprocessed part on pre-emption if we don't want to split it into multiple Taskcluster tasks to reduce the complexity of the graph.

Currently, I have to use standard instances because it can take several days on a larger student corpus (400M sentences):

  worker-classes:
    default: gcp-spot
    alignments-original: gcp-standard
    alignments-backtranslated: gcp-standard
    alignments-student: gcp-standard
    shortlist: gcp-standard

Closes #721

mozilla / translations

Process alignments in chunks #763