stuckedm but GPU still running

andyyuan78 commented 9 years ago

while I am runing the example, I stuck here almost 24 hours,

and I checked the GPU still works!

ubgpu@ubgpu:~/github/DeepCLKgsgo/DeepCL/build$ ./deepclrun dataset=kgsgoall netdef=12(32c5z-relu)-500n-tanh-361n numepochs=15 learningrate=0.0001
Using dataset kgsgoall: datadir: ../data/kgsgo: trainfile: kgsgo-trainall-v2.dat: validatefile: kgsgo-test-v2.dat: Ntrain 33630595 numPlanes 7 imageSize 19 Ntest 18860 Ntest after load images 759 ms image stats mean 12.3638 stdDev 54.7709 image norm translate -12.3638 scale 0.00912893 after getting stats 96 ms Using NVIDIA Corporation platform: NVIDIA CUDA Using device: GeForce GTX 970 netDefLower [12_(32c5z-relu)-500n-tanh-361n] nnString: [12] repeatNum 12 remainderString [(32c5z-relu)-500n-tanh-361n] inner [32c5z-relu] newRemainder [-500n-tanh-361n] postfix [500n-tanh-361n] multiplied string: 32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-500n-tanh-361n GpuAdd: building kernel CopyBuffer: building kernel Using trainer SGD{ learningRate=0.0001, momentum=0 } layer 0:InputLayer{ outputPlanes=7 outputImageSize=19 } layer 1:NormalizationLayer{ outputPlanes=7 outputImageSize=19 translate=-12.3638 scale=0.00912893 } layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=7 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 3:ActivationLayer{ RELU } layer 4:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 5:ActivationLayer{ RELU } layer 6:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 7:ActivationLayer{ RELU } layer 8:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 9:ActivationLayer{ RELU } layer 10:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 11:ActivationLayer{ RELU } layer 12:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 13:ActivationLayer{ RELU } layer 14:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 15:ActivationLayer{ RELU } layer 16:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 17:ActivationLayer{ RELU } layer 18:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 19:ActivationLayer{ RELU } layer 20:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 21:ActivationLayer{ RELU } layer 22:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 23:ActivationLayer{ RELU } layer 24:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} } layer 25:ActivationLayer{ RELU } layer 26:FullyConnectedLayer{ numPlanes=500 imageSize=1 } layer 27:ActivationLayer{ TANH } layer 28:FullyConnectedLayer{ numPlanes=361 imageSize=1 } layer 29:SoftMaxLayer{ perPlane=0 numPlanes=361 imageSize=1 } Parameters overview: (skipping 16 layers with 0 params) layer 2: params=5632 0.1% layer 4: params=25632 0.4% layer 6: params=25632 0.4% layer 8: params=25632 0.4% layer 10: params=25632 0.4% layer 12: params=25632 0.4% layer 14: params=25632 0.4% layer 16: params=25632 0.4% layer 18: params=25632 0.4% layer 20: params=25632 0.4% layer 22: params=25632 0.4% layer 24: params=25632 0.4% layer 26: params=5776500 92.5% layer 28: params=180861 2.9% TOTAL : params=6244945 before learning start 46587 ms MultiplyInPlace: building kernel sqrt: building kernel squared: building kernel PerElementMultInPlace: building kernel kernelAddScalar: building kernel kernelInv: building kernel options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=361 -D gPixelsPerThread=1 options -D gWorkgroupSize=32 -D gPixelsPerThread=1 options -D gWorkgroupSize=32 -D gPixelsPerThread=1 layer2 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer4 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer6 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer8 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer10 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer12 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer14 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer16 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer18 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer20 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer22 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer24 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical layer2 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 6ms instance 2: 3ms instance 3: 2ms instance 4: 2ms instance 5: cannot be used instance 6: 4ms selected: instance 3 layer4 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 29ms selected: instance 4 layer6 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 29ms selected: instance 4 layer8 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 29ms selected: instance 4 layer10 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 29ms selected: instance 4 layer12 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 23ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 29ms selected: instance 4 layer14 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 29ms selected: instance 4 layer16 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 32ms selected: instance 4 layer18 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 31ms selected: instance 4 layer20 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 30ms selected: instance 4 layer22 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 23ms instance 2: 13ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 30ms selected: instance 4 layer24 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 22ms instance 2: 14ms instance 3: 10ms instance 4: 6ms instance 5: cannot be used instance 6: 31ms selected: instance 4 layer26 ForwardAuto: instance 6 this instance cant be used: Out of resources, code -5 layer26 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 153ms instance 2: 378ms instance 3: 767ms instance 4: 93ms instance 5: 27ms instance 6: cannot be used selected: instance 5 layer28 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 2ms instance 2: 16ms instance 3: 15ms instance 4: 15ms instance 5: 13ms instance 6: 11ms selected: instance 1

my GPU info:

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 C+G Not Supported | +-----------------------------------------------------------------------------+ ubgpu@ubgpu:~/big_data$

hughperkins commented 9 years ago

Yeah, that's a 32 million example dataset. On half a K520, it takes about 2-3 days per epoch. You could use numtrain=1000000 to convince yourself its working on a smaller dataset first.

hughperkins commented 9 years ago

also, be sure to add loadondemand=1, so that it doesnt try to load entire dataset into memory at once.

andyyuan78 commented 9 years ago

while I am runing

ubgpu@ubgpu:~/github/DeepCL_Kgsgo/DeepCL/build$ ./deepclrun dataset=kgsgoall netdef=12*(32c5z-relu)-500n-tanh-361n numepochs=15 learningrate=0.0001 loadondemand=1 numtrain=1000000

it seems that the program run from the previous session like: layer28 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 2ms instance 2: 16ms instance 3: 15ms instance 4: 15ms instance 5: 13ms instance 6: 11ms selected: instance 1

is this Ok? or how to clean the previous session

hughperkins commented 9 years ago

When you start deepcl, it tries different forward propagation kernels, and chooses the fastest one., In this case kernel 1 runs the fastest, 2ms per batch, and it chooses that one. It's quite ok :-)

On 5/27/15, Andy Yuan notifications@github.com wrote:

while I am runing

ubgpu@ubgpu:~/github/DeepCL_Kgsgo/DeepCL/build$ ./deepclrun dataset=kgsgoall netdef=12*(32c5z-relu)-500n-tanh-361n numepochs=15 learningrate=0.0001 loadondemand=1 numtrain=1000000

it seems that the program run from the previous session like: layer28 ForwardAuto::forward choosing best instance: instance 0: cannot be used instance 1: 2ms instance 2: 16ms instance 3: 15ms instance 4: 15ms instance 5: 13ms instance 6: 11ms selected: instance 1

is this Ok? or how to clean the previous session

Reply to this email directly or view it on GitHub: https://github.com/hughperkins/DeepCL/issues/24#issuecomment-105886719

hughperkins commented 9 years ago

I guess I will close this now :-)

hughperkins / DeepCL

stuckedm but GPU still running #24