yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 358 forks source link

After I modify the source code, there is a error"F0824 11:37:22.753271 21946 math_functions.cu:79] Check failed: error == cudaSuccess (11 vs. 0) invalid argument" #276

Closed Zzmc closed 7 years ago

Zzmc commented 7 years ago

Hi, I have modified the source codeDataFrameSource.scala,i want to make it support batch_size=2(batch_size = 1 is already running successful),but when i run CaffeOnSaprk , there is a error "Check failed: error == cudaSuccess (11 vs. 0) invalid argument".What i modified is that setFloatBlob and nextBatch.

setFloatBlob:

def setFloatBlob(clr: Int, offset: Int, stride: Int, data: Array[Float],
                   blob: FloatBlob): Unit = {
    //log.info("DEBUG: Start DF->setFloatBlob")
    val dataLen: Int = (data.length - 1) * stride + offset + 1
    if (dataLen > blob.count()) {
      throw new IllegalArgumentException("blob size is "
        + blob.count() + ", but total data length is "
        + dataLen + ".")
    }
    val blobCPU = blob.cpu_data()
    if(clr == 0) {
      for (i <- 0 until blob.count()) {
        blobCPU.set(i, 0)
      }
    }
    for (i <- 0 until data.length) {
      val index = offset + i * stride
      blobCPU.set(index, data(i))
    }
  }

nextbatch

val data: Array[Float] = String2DetectData(sample._2(i).asInstanceOf[String],count).toArray
            //log.info("DEBUG: data.length:"+data.length)
            val blob: FloatBlob = batchData(i).asInstanceOf[FloatBlob]
            val offset: Int = label_offset
            setFloatBlob(count, offset, 1, data, blob)
            label_offset += data.length

my Label_data is changing along with the picture,so i set the offset is data.length.For eaample: batch_size=2, i Specify the blob.reshape(1,1,10*batchsize,1), my labelData_1 is [0,1,2,3,4], labelData_2 is [1 ,2,3,4,5].so i think after i put the data in Blob ,it should be [0,1,2,3,4,1,2,3,4,5,0,0,0,0,0,0,0,0,0,0]. Is my idea correct?Could you give some suggestions?Thanks!