quantalea / AleaTK

Library for general purpose numerical computing and Machine Learning based on tensors and tensor expressions.
http://www.aleatk.com
Apache License 2.0
75 stars 25 forks source link

Add CPU batcher to reduce memory footprint on the GPU #7

Open HunorSzabo opened 7 years ago

HunorSzabo commented 7 years ago

During training and especially training with heavier data augmentation I experienced that huge allocation is done in the beginning of training because the batcher allocates the full training dataset on GPU memory. This memory segment is rarely used since the batcher forwards it only in specific segments of the training.

Moving it to the heap and allocating in on the GPU memory only insignificantly slows down the computation but significantly increases memory bottleneck which is the current issue with the modern GPUs.

Under this class, I also introduced horizontal mirroring, widely used method for data augmentation.

A nice to have would be the introduction of data preprocessing methods on Batcher level such as zero-centering and preprocessing data.