tdeboissiere / DeepLearningImplementations

Implementation of recent Deep Learning papers
MIT License
1.81k stars 650 forks source link

WassersteinGAN 16Gmemory MemoryError #13

Closed zdx3578 closed 7 years ago

zdx3578 commented 7 years ago

Traceback (most recent call last): File "main.py", line 81, in launch_training(d_params) File "main.py", line 11, in launch_training train_WGAN.train(kwargs) File "/home/ubuntu/work/DeepLearningImplementations/WassersteinGAN/src/model/train_WGAN.py", line 47, in train X_real_train = data_utils.load_image_dataset(dset, img_dim, image_dim_ordering) File "../utils/data_utils.py", line 94, in load_image_dataset X_real_train = load_celebA(img_dim, image_dim_ordering) File "../utils/data_utils.py", line 83, in load_celebA X_real_train = normalization(X_real_train, image_dim_ordering) File "../utils/data_utils.py", line 14, in normalization X = X / 255. MemoryError

zdx3578 commented 7 years ago
parser.add_argument('--nb_epoch', default=400, type=int, help="Number of batches per epoch")

parser.add_argument('--n_batch_per_epoch', default=200, type=int, help="Number of training epochs")

??

zdx3578 commented 7 years ago

gpu is ok! because update tensorflow change tensorflow to cpu version; back to GPU version is ok;

aws p2 gpu is slow than r3.2xlarge cpu

gpu: 64/6400 [..............................] - ETA: 28866s - Loss_D: 0.0817 - Loss_D_real: 0.0421 - Loss_D_gen: -0.0396 - Loss_G: -0.0533 cpu: ETA: 16767s

Attached GPUs : 1 GPU 0000:00:1E.0 Utilization Gpu : 99 % Memory : 4 % Encoder : 0 % Decoder : 0 % GPU Utilization Samples Duration : 8.86 sec Number of Samples : 54 Max : 100 % Min : 0 % Avg : 22 % Memory Utilization Samples Duration : 8.86 sec Number of Samples : 54 Max : 4 % Min : 0 % Avg : 1 %

anxingle commented 7 years ago

@zdx3578 Can U give the configure of your hardware( GPU ,Memory) in detail ? MemoryError happen due to the batch size is too big.

zdx3578 commented 7 years ago

need more test!

zdx3578 commented 7 years ago

aws 64G memory is ok.