Open Teragron opened 5 years ago
@lunayang712 thanks a lot, so i need 5 folder in DIV2K right?
yeah. good luck! @Teragron
check the lr img path? or maybe the epoch, batch size setting parameters are going wrong.
It will change the epoch. When depending on size of dataset. this is with batch_size=16 and output image array set to 4x4; images = 1024x768 count: 1658
Epoch: [0/100] step: [100/6] time: 31.102s, mse: 0.066 Epoch: [0/100] step: [101/6] time: 31.113s, mse: 0.051 Epoch: [0/100] step: [102/6] time: 31.199s, mse: 0.095 Epoch: [1/100] step: [0/6] time: 33.887s, mse: 0.060 Epoch: [1/100] step: [1/6] time: 31.681s, mse: 0.065 Epoch: [1/100] step: [2/6] time: 34.661s, mse: 0.064 Epoch: [1/100] step: [3/6] time: 32.734s, mse: 0.062 Epoch: [1/100] step: [4/6] time: 33.832s, mse: 0.075 Epoch: [1/100] step: [5/6] time: 31.289s, mse: 0.046 Epoch: [1/100] step: [6/6] time: 31.343s, mse: 0.038 Epoch: [1/100] step: [7/6] time: 33.424s, mse: 0.045 Epoch: [1/100] step: [8/6] time: 31.506s, mse: 0.047 Epoch: [1/100] step: [9/6] time: 34.266s, mse: 0.059 Epoch: [1/100] step: [10/6] time: 35.366s, mse: 0.036
this is strange too
Epoch: [2/1] step: [70/125] time: 198.491s, g_loss(mse:0.032, vgg:0.051, adv:0.007) d_loss: 1.294 Epoch: [2/1] step: [71/125] time: 219.722s, g_loss(mse:0.033, vgg:0.054, adv:0.004) d_loss: 0.420 Epoch: [2/1] step: [72/125] time: 221.830s, g_loss(mse:0.024, vgg:0.042, adv:0.005) d_loss: 0.463 Epoch: [3/1] step: [0/125] time: 213.389s, g_loss(mse:0.032, vgg:0.036, adv:0.004) d_loss: 0.400 Epoch: [3/1] step: [1/125] time: 234.970s, g_loss(mse:0.033, vgg:0.031, adv:0.010) d_loss: 0.902
why the training is so slow.. did you use GPU?
Data set info: 1 176 images at 1920x1080 CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz GPU: NVIDIA Quadro P1000
I have installed tensorflow-gpu. Does not apper to. at least in first phase Epoch: [0/1] step: [3/0] time: 42.531s, mse: 0.325 Sometímes some sparks of activity on cuda cores lasting about 1 min. Does not apper to have effect on step time. Memory currently 35GB / 16 GB
Epoch: [0/1] step: [71/0] time: 38.153s, mse: 0.037 Epoch: [0/1] step: [72/0] time: 38.961s, mse: 0.050 Epoch: [0/1] step: [0/125] time: 226.178s, g_loss(mse:0.042, vgg:0.044, adv:0.000) d_loss: 3.124 Epoch: [0/1] step: [1/125] time: 230.655s, g_loss(mse:0.054, vgg:0.061, adv:0.004) d_loss: 4.261 Memory is more than 200% of on board memory. Sparks of activity on gpu are continuing
How do I train on GPU? I restarted training, becouse I changed epoch count to 10 so the model saves. GPU is doing basicly nothing.
Batch size was 36 (Max I can go)
Epoch: [9/10] step: [31/0] time: 148.904s, mse: 0.033
2019-10-29 18:59:18.131505: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at conv_ops.cc:501 : Resource exhausted: OOM when allocating tensor with shape[36,48,48,256] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
Traceback (most recent call last):
File "train.py", line 202, in
@mcDandy you should do CUDA setting to use GPU.
search : tensorflow check use gpu
( tf.test.is_gpu_available | TensorFlow Core v2.3.0 )
if u use AWS, u should use AMI for deep learning
@Teragron for train you need both train_LR and train_HR file.