I ran into an error on Windows.
During execution we call
cudaMemcpyAsync(buffers[0], curInput.data(), bufferSize[0], cudaMemcpyHostToDevice, stream);
and curInput.data() does not have ehough time to load into GPU memory. In result
context->execute(BATCH_SIZE, buffers);
executes on previous image.
cudaStreamSynchronize(stream);
solved this problem.
I ran into an error on Windows. During execution we call
cudaMemcpyAsync(buffers[0], curInput.data(), bufferSize[0], cudaMemcpyHostToDevice, stream);
andcurInput.data()
does not have ehough time to load into GPU memory. In resultcontext->execute(BATCH_SIZE, buffers);
executes on previous image.cudaStreamSynchronize(stream);
solved this problem.