Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k
stars
358
forks
source link
After I modify the source code, there is a error"F0824 11:37:22.753271 21946 math_functions.cu:79] Check failed: error == cudaSuccess (11 vs. 0) invalid argument" #276
Hi,
I have modified the source codeDataFrameSource.scala,i want to make it support batch_size=2(batch_size = 1 is already running successful),but when i run CaffeOnSaprk , there is a error "Check failed: error == cudaSuccess (11 vs. 0) invalid argument".What i modified is that setFloatBlob and nextBatch.
setFloatBlob:
def setFloatBlob(clr: Int, offset: Int, stride: Int, data: Array[Float],
blob: FloatBlob): Unit = {
//log.info("DEBUG: Start DF->setFloatBlob")
val dataLen: Int = (data.length - 1) * stride + offset + 1
if (dataLen > blob.count()) {
throw new IllegalArgumentException("blob size is "
+ blob.count() + ", but total data length is "
+ dataLen + ".")
}
val blobCPU = blob.cpu_data()
if(clr == 0) {
for (i <- 0 until blob.count()) {
blobCPU.set(i, 0)
}
}
for (i <- 0 until data.length) {
val index = offset + i * stride
blobCPU.set(index, data(i))
}
}
nextbatch
val data: Array[Float] = String2DetectData(sample._2(i).asInstanceOf[String],count).toArray
//log.info("DEBUG: data.length:"+data.length)
val blob: FloatBlob = batchData(i).asInstanceOf[FloatBlob]
val offset: Int = label_offset
setFloatBlob(count, offset, 1, data, blob)
label_offset += data.length
my Label_data is changing along with the picture,so i set the offset is data.length.For eaample:
batch_size=2, i Specify the blob.reshape(1,1,10*batchsize,1), my labelData_1 is [0,1,2,3,4], labelData_2 is [1 ,2,3,4,5].so i think after i put the data in Blob ,it should be [0,1,2,3,4,1,2,3,4,5,0,0,0,0,0,0,0,0,0,0].
Is my idea correct?Could you give some suggestions?Thanks!
Hi, I have modified the source codeDataFrameSource.scala,i want to make it support batch_size=2(batch_size = 1 is already running successful),but when i run CaffeOnSaprk , there is a error "Check failed: error == cudaSuccess (11 vs. 0) invalid argument".What i modified is that setFloatBlob and nextBatch.
setFloatBlob:
nextbatch
my Label_data is changing along with the picture,so i set the offset is data.length.For eaample: batch_size=2, i Specify the blob.reshape(1,1,10*batchsize,1), my labelData_1 is [0,1,2,3,4], labelData_2 is [1 ,2,3,4,5].so i think after i put the data in Blob ,it should be [0,1,2,3,4,1,2,3,4,5,0,0,0,0,0,0,0,0,0,0]. Is my idea correct?Could you give some suggestions?Thanks!