Reduce the number of metrics produced by distributed training

The current implementation of distributed lightgbm produced multiple metrics per node: main metric curve + training time, data loading time, etc.

When running on N nodes, this produces N*6 metrics, quickly reaching the mlflow/azureml limits, and becoming a UI nightmare. We're also hitting 439 exceptions in the mlflow call.

The proposition is to group some of the metrics together (training time).