Config files contain a lot of repetition - can this be avoided?

Related to issue #22

Basically, with the Dataclasses and their inheritance we can setup pipeline config and pipeline steps in a following way: Config example:

- pipeline:
  - src_lang: en
  - tgt_lang: de 
  - steps:
  - 
    - step: raw
      step_label: gather.${global.src_lang}-${global.tgt_lang}
      raw_data_dir: ${global.raw_data_dir}
    - step: raw
      step_label: valid.${global.src_lang}-${global.tgt_lang}
      raw_data_dir: ${global.valid_data_dir}

tl;dr: We can get a reasonable simplification with Dataclasses and later we can consider some "syntactic sugar" for the most common step configurations not simplified byt the refactor

The dataclass implementation would then have a general "pipeline" dataclass (containing stuff line src_lang, tgt_lang) and "raw" step (and other steps) dataclass could, by default, inherit the "pipeline" values (src, tgt lang) if not overwriten by user. This would simplify config files when defining models/corpora in one direction. For the opposite direction, we would have to add either addtional optional arguments to pipeline steps (e.g., "reverse") or add some "fake" steps, such as BackwardTrainSteps, which would in practice create a regular TrainSteps with "rewired" arguments.

hplt-project / OpusPocus

Config files contain a lot of repetition - can this be avoided? #27