Add CPU batcher to reduce memory footprint on the GPU

During training and especially training with heavier data augmentation I experienced that huge allocation is done in the beginning of training because the batcher allocates the full training dataset on GPU memory. This memory segment is rarely used since the batcher forwards it only in specific segments of the training.

Moving it to the heap and allocating in on the GPU memory only insignificantly slows down the computation but significantly increases memory bottleneck which is the current issue with the modern GPUs.

Under this class, I also introduced horizontal mirroring, widely used method for data augmentation.

A nice to have would be the introduction of data preprocessing methods on Batcher level such as zero-centering and preprocessing data.

quantalea / AleaTK

Add CPU batcher to reduce memory footprint on the GPU #7