Open pskrunner14 opened 4 years ago
Looks ok, but I don't have access to my implementation of the nvidia code. I'll try and look next week
@pskrunner14 Looking at your code, I see in other places you have direct calls to AdvanceDecoding. I'd be careful there as nvidia does change their code regularly. But in your code, the lamba call backs can occur in parallel, so it looks like you might be missing a lock on the callback.
The API i've used seems pretty simple.
Init
CreateTaskGroup
Call DecodeWithCallback
WaitForGroup
DestroyTaskGroup
I know you are not using the task group feature. That was added to allow for a continuous stream of data where you want to know a batch of processing is now complete through the library.
Your error is within the gpu code, but I'd want to ensure you don't have a threading issue first. The copy is a dma call that should only fail if the parameters to DMA have an error or if the gpu is not active. (or if the memory types are wrong). And I believe kaldi should ensure the gpu is always active so that should not be possible without some other complicating factor
@pskrunner14 have you looked at this more ?
@btiplitz I'll take a look this or next week.
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.
I am using the
BatchedThreadedNnet3CudaPipeline2
pipeline similar to how it's used in cudadecoderbin/batched-wav-nnet3-cuda2.cc in a custom application. On running the modified code, I got the following error:From what I can gather, it has something to do with CUDA not able to copy an NNet3 component to the GPU as called at cudamatrix/cu-vector.cc#L1086 from top level call at cudadecoder/batched-threaded-nnet3-cuda-online-pipeline.cc#L407.
I also tried using the
batched-wav-nnet3-cuda2
binary to see if there was some issue with the model etc. but it ran fine:Would appreciate some help on this issue. Adding link to code for reference: https://github.com/Vernacular-ai/kaldi-serve/blob/gpu-decoder/src/decoder/decoder-batch.cpp