peiyunh / tiny

Tiny Face Detector, CVPR 2017
https://cs.cmu.edu/~peiyunh/tiny
Other
1.13k stars 320 forks source link

hr_res101('train') error : "Error using gpuDevice (line 26) Invalid CUDA device id" #48

Closed niamul070 closed 7 years ago

niamul070 commented 7 years ago

When I run hr_res101('train"), I am getting the error mentioned above. Can you tell how to fix it. Below is the detailed output and error message:

hr_res101('train');

ans =

models/widerface-resnet-101-simple-sample256-posfrac0.5-N25-bboxreg-cluster-scaled

Trying to initialize the structure of resnet-101-simple Unknown model: cannot initialize. Loading pretrained weights from ./trained_models/imagenet-resnet-101-dag.mat Loaded imdb from data/widerface/imdb.mat cluster path: data/widerface/RefBox_N25_scaled.mat

opts =

struct with fields:

  keepDilatedZeros: 0
         inputSize: [500 500]
      learningRate: [1×30 double]
           trainFn: '@cnn_train_dag_hardmine'
     batchGetterFn: '@cnn_get_batch_hardmine'
      freezeResNet: 0
               tag: ''
        clusterNum: 25
       clusterName: 'scaled'
           bboxReg: 1
        skipLRMult: [0 1 0.1000]
        sampleSize: 256
       posFraction: 0.5000
         posThresh: 0.7000
         negThresh: 0.3000
            border: [0 0]
 pretrainModelPath: './trained_models/imagenet-resnet-101-dag.mat'
           dataDir: 'data/widerface'
         modelType: 'resnet-101-simple'
       networkType: 'dagnn'
batchNormalization: 1
  weightInitMethod: 'gaussian'
    minClusterSize: [10 10]
    maxClusterSize: [Inf Inf]
            expDir: 'models/widerface-resnet-101-simple-sample256-posf...'
         batchSize: 48
     numSubBatches: 1
         numEpochs: 50
              gpus: [1 2 3 4]
   numFetchThreads: 8
              lite: 0
          imdbPath: 'data/widerface/imdb.mat'
             train: [1×1 struct]

ans =

struct with fields:

            gpus: [1 2 3 4]
       batchSize: 48
   numSubBatches: 1
       numEpochs: 50
    learningRate: [1×30 double]
keepDilatedZeros: 0

Start using dagnn.DetLoss for loss Starting parallel pool (parpool) using the 'local' profile ... Warning: The system time zone setting, 'US/Eastern', does not specify a single time zone unambiguously. It will be treated as 'America/New_York'. See the <a href="matlab:doc('datetime.TimeZone')">datetime.TimeZone property for details about specifying time zones.

In verifyTimeZone (line 23) In datetime (line 503) In parallel.internal.cluster.FileSerializer>iLoadDate (line 345) In parallel.internal.cluster.FileSerializer/getFields (line 100) In parallel.internal.cluster.CJSSupport/getProperties (line 252) In parallel.internal.cluster.CJSSupport/getJobProperties (line 463) In parallel.internal.cluster.CJSJobMixin/hGetProperty (line 70) In parallel.internal.cluster.CJSJobMixin/hSetTerminalStateFromCluster (line 98) In parallel.cluster.CJSCluster/hGetJobState (line 361) In parallel.internal.cluster.CJSJobMixin/getStateEnum (line 136) In parallel.Job/get.StateEnum (line 214) In parallel.Job/get.State (line 206) In parallel.internal.customattr.CustomGetSet>iVectorisedGetHelper (line 107) In parallel.internal.customattr.CustomGetSet>@(a,b,c)iVectorisedGetHelper(obj,a,b,c) (line 89) In parallel.internal.customattr.CustomGetSet/doVectorisedGet (line 90) In parallel.internal.customattr.CustomGetSet/hVectorisedGet (line 64) In parallel.internal.customattr.GetSetImpl>iAccessProperties (line 289) In parallel.internal.customattr.GetSetImpl>iGetAllProperties (line 250) In parallel.internal.customattr.GetSetImpl.getImpl (line 124) In parallel.internal.customattr.CustomGetSet/get (line 30) In parallel.internal.pool.InteractiveClient/pRemoveOldJobs (line 464) In parallel.internal.pool.InteractiveClient/start (line 311) In parallel.Pool>iStartClient (line 567) In parallel.Pool.hBuildPool (line 446) In parallel.internal.pool.doParpool (line 15) In parpool (line 89) In cnn_train_dag_hardmine>prepareGPUs (line 604) In cnn_train_dag_hardmine (line 132) In cnn_widerface (line 212) In hr_res101 (line 41) connected to 4 workers. cnn_train_dag_hardmine: resetting GPU Error using cnn_train_dag_hardmine>prepareGPUs (line 616) Error detected on worker 3.

Error in cnn_train_dag_hardmine (line 132) prepareGPUs(opts, epoch == start+1) ;

Error in cnn_widerface (line 212) [net, info] = trainFn(net, imdb, getBatchFn(batchGetter, opts, net.meta), ...

Error in hr_res101 (line 41) cnn_widerface('inputSize', inputSize, ...

Caused by: Error using gpuDevice (line 26) Invalid CUDA device id: 3. Select a device id from the range 1:1.

When I run gpuDevice from matlab prompt this is what I get:

gpuDevice

ans =

CUDADevice with properties:

                  Name: 'Quadro M4000'
                 Index: 1
     ComputeCapability: '5.2'
        SupportsDouble: 1
         DriverVersion: 8
        ToolkitVersion: 7.5000
    MaxThreadsPerBlock: 1024
      MaxShmemPerBlock: 49152
    MaxThreadBlockSize: [1024 1024 64]
           MaxGridSize: [2.1475e+09 65535 65535]
             SIMDWidth: 32
           TotalMemory: 8.4922e+09
       AvailableMemory: 7.5519e+09
   MultiprocessorCount: 13
          ClockRateKHz: 772500
           ComputeMode: 'Default'
  GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
      CanMapHostMemory: 1
       DeviceSupported: 1
        DeviceSelected: 1
niamul070 commented 7 years ago

Never mind I solved it. Thanks

vivekkumar1712 commented 7 years ago

I'm facing the same problem. Can you please tell me how you resolved this issue?