Closed zdx3578 closed 7 years ago
parser.add_argument('--nb_epoch', default=400, type=int, help="Number of batches per epoch")
parser.add_argument('--n_batch_per_epoch', default=200, type=int, help="Number of training epochs")
??
gpu is ok! because update tensorflow change tensorflow to cpu version; back to GPU version is ok;
aws p2 gpu is slow than r3.2xlarge cpu
gpu: 64/6400 [..............................] - ETA: 28866s - Loss_D: 0.0817 - Loss_D_real: 0.0421 - Loss_D_gen: -0.0396 - Loss_G: -0.0533 cpu: ETA: 16767s
Attached GPUs : 1 GPU 0000:00:1E.0 Utilization Gpu : 99 % Memory : 4 % Encoder : 0 % Decoder : 0 % GPU Utilization Samples Duration : 8.86 sec Number of Samples : 54 Max : 100 % Min : 0 % Avg : 22 % Memory Utilization Samples Duration : 8.86 sec Number of Samples : 54 Max : 4 % Min : 0 % Avg : 1 %
@zdx3578 Can U give the configure of your hardware( GPU ,Memory) in detail ? MemoryError happen due to the batch size is too big.
need more test!
aws 64G memory is ok.
Traceback (most recent call last): File "main.py", line 81, in
launch_training(d_params)
File "main.py", line 11, in launch_training
train_WGAN.train(kwargs)
File "/home/ubuntu/work/DeepLearningImplementations/WassersteinGAN/src/model/train_WGAN.py", line 47, in train
X_real_train = data_utils.load_image_dataset(dset, img_dim, image_dim_ordering)
File "../utils/data_utils.py", line 94, in load_image_dataset
X_real_train = load_celebA(img_dim, image_dim_ordering)
File "../utils/data_utils.py", line 83, in load_celebA
X_real_train = normalization(X_real_train, image_dim_ordering)
File "../utils/data_utils.py", line 14, in normalization
X = X / 255.
MemoryError