grimoire / amirstan_plugin

Useful tensorrt plugin. For pytorch and mmdetection model conversion.
MIT License
159 stars 38 forks source link

Add deepstream parser #2

Closed daavoo closed 4 years ago

daavoo commented 4 years ago

This P.R. allows the usage of models exported with mmdet2trt inside DeepStream

It adds a CMAKE option WITH_DEEPSTREAM.

Enabling this option will include a custom output parser for deepstream in the shared object library.

To be latter referenced in the DeepStream configuration file as the following example:

parse-bbox-func-name=NvDsInferParseMmdet
output-blob-names=num_detections;boxes;scores;classes
custom-lib-path=/home/nvidia/amirstan_plugin/build/lib/libamirstan_plugin.so
grimoire commented 4 years ago

Cool! I will check it later.

daavoo commented 4 years ago

Cool! I will check it later.

Hi @grimoire . Thanks for making and sharing this code.

I had encounter some problem getting this to work inside a Jetson Xavier NX I have found a temporal solution but I think that you might have a better solution.

It is related with the batchedNMSPlugin.

It appears that the shape of the numDetections output is not being properly set.

DeepStream is able to correctly parse the exported TensorRT engine:

INFO nvinfer gstnvinfer.cpp:602:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1577> [UID = 1]: deserialized trt engine from :{model}.engine
INFO: [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT input_0 3x300x300
1 OUTPUT kINT32 num_detections 0
2 OUTPUT kFLOAT boxes 200x4
3 OUTPUT kFLOAT scores 200
4 OUTPUT kFLOAT classes 200

But then fails when allocating the buffers for the outputs:

ERROR nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::allocateBuffers() <nvdsinfer_context_impl.cpp:1195> [UID = 1]: Failed to allocate cuda output buffer during context initialization

The problem is this snippet (from deepstream-5.0/sources/libs/nvdsinfer/nvdsinfer_context_impl.cpp:1174::

        for (unsigned int jL = 0; jL < m_AllLayerInfo.size(); jL++)
        {
            const NvDsInferBatchDimsLayerInfo& layerInfo = m_AllLayerInfo[jL];
            const NvDsInferDims& bindingDims = layerInfo.inferDims;
            assert(bindingDims.numElements > 0);
            size_t size = m_MaxBatchSize *
                          bindingDims.numElements*
                          getElementSize(layerInfo.dataType);
            if (layerInfo.isInput)
            {
                /* Reuse input binding buffer pointers. */
                batch.m_DeviceBuffers[jL] = m_BindingBuffers[jL];
            }
            else
            {
                /* Allocate device memory for output layers here. */
                auto outputBuf = std::make_unique<CudaDeviceBuffer>(size);
                if (!outputBuf || !outputBuf->ptr())
                {
                    printError(
                        "Failed to allocate cuda output buffer during context "
                        "initialization");
                    return NVDSINFER_CUDA_ERROR;
                }
                batch.m_DeviceBuffers[jL] = outputBuf->ptr();
                batch.m_OutputDeviceBuffers.emplace_back(std::move(outputBuf));
            }

bindingDims.numElements is returning 0 for the numDetections layer (and the assert is not working).

So the temporary fix I made is:

            const NvDsInferBatchDimsLayerInfo& layerInfo = m_AllLayerInfo[jL];
            const NvDsInferDims& bindingDims = layerInfo.inferDims;
            assert(bindingDims.numElements > 0);
            int numElements = bindingDims.numElements;
            if (jL == 1) {
                numElements = 1;
            }
            size_t size = m_MaxBatchSize *
                          numElements *
                          getElementSize(layerInfo.dataType);

I was planning on reviewing the batchedNMSPlugin and sending a P.R. but you might already know where the problem is.

grimoire commented 4 years ago

Hi, Thanks for the bug report.

Guess I should add an extra dim to num_detection. in batchedNMSPlugin.cpp

nvinfer1::DimsExprs BatchedNMSPlugin::getOutputDimensions(
    int outputIndex, const nvinfer1::DimsExprs *inputs, int nbInputs, nvinfer1::IExprBuilder &exprBuilder)
{
    ASSERT(nbInputs == 2);
    ASSERT(outputIndex >= 0 && outputIndex < this->getNbOutputs());
    ASSERT(inputs[0].nbDims == 4);
    ASSERT(inputs[1].nbDims == 3);

    nvinfer1::DimsExprs ret; 
    switch(outputIndex){
    case 0:
        ret.nbDims=1;
        break;
    case 1:
        ret.nbDims=3;
        break;
    case 2:
    case 3:
        ret.nbDims=2;
        break;
    default:
        break;
    }

    ret.d[0] = inputs[0].d[0];

    if(outputIndex>0){
        ret.d[1] = exprBuilder.constant(param.keepTopK);
    }
    if(outputIndex==1){
        ret.d[2] = exprBuilder.constant(4);
    }

    return ret;

}

Fix this method should give you the right shape (set the ret.nbDims to 2 when outputIndex==0, add a constant to ret.d[1], etc). I will fix it when I have times. Or you can send the P.R if you want to. Any P.R or bug report are welcome!

grimoire commented 4 years ago

Thanks for the contribution! It is really cool. Would you please help me updating the README.md of amirstan_plugin and mmetection-to-tensorrt about deepstream support? I don't know much about deepstream.

daavoo commented 4 years ago

Thanks for the contribution! It is really cool. Would you please help me updating the README.md of amirstan_plugin and mmetection-to-tensorrt about deepstream support? I don't know much about deepstream.

Sure thing. I will update the READMEs