grimoire / mmdetection-to-tensorrt

convert mmdetection model to tensorrt, support fp16, int8, batch input, dynamic shape etc.
Apache License 2.0
588 stars 85 forks source link

MemoryError on jetson TX2 #38

Closed prakashjayy closed 3 years ago

prakashjayy commented 3 years ago

I am trying to convert model from mmdetection2tensorrt using the Dockerfile provided on TX2 machine but getting Memory error issues

mmdet2trt configs/retinanet_r50_fpn_2x_coco.py weights/retinanet_r50_fpn_2x_coco_20200131-fdb43119.pth weights/model.trt --min-scale 1 3 800 600 --max-scale 1 3 800 600 --opt-scale 1 3 800 600
INFO:mmdet2trt:Model warmup
INFO:mmdet2trt:Converting model
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
INFO:mmdet2trt:Conversion took 80.97697949409485 s
INFO:mmdet2trt:Saving TRT model to: weights/model.trt
Killed

enviroment:

we have made several changes to Dockerfile to be able to make it run jetson tx2 device.

FROM nvcr.io/nvidia/l4t-base:r32.4.4

### update apt and install libs
RUN apt-get update &&\
    apt-get install -y vim cmake libsm6 libxext6 libxrender-dev libgl1-mesa-glx git

### torch install 
RUN wget https://nvidia.box.com/shared/static/9eptse6jyly1ggt9axbja2yrmj6pbarc.whl -O torch-1.6.0-cp36-cp36m-linux_aarch64.whl &&\
    apt-get install -y python3-pip libopenblas-base libopenmpi-dev &&\
    pip3 install Cython &&\
    pip3 install numpy torch-1.6.0-cp36-cp36m-linux_aarch64.whl
### python
RUN pip3 install --upgrade pip

# ### install mmcv

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python3-opencv

### scikit image
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y \
  && apt-get install -y --no-install-recommends apt-utils \
  && apt-get install -y \
    python3-dev libpython3-dev python-pil python3-tk python-imaging-tk \
    build-essential wget locales liblapack-dev

RUN sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
    dpkg-reconfigure --frontend=noninteractive locales && \
    update-locale LANG=en_US.UTF-8
ENV LANG en_US.UTF-8

RUN wget -q -O /tmp/get-pip.py --no-check-certificate https://bootstrap.pypa.io/get-pip.py \
  && python3 /tmp/get-pip.py \
  && pip3 install -U pip
RUN pip3 install -U testresources setuptools

RUN pip3 install -U numpy
#####

RUN git clone https://github.com/open-mmlab/mmcv.git /root/space/mmcv &&\
    cd root/space/mmcv &&\
    MMCV_WITH_OPS=1 pip install -e .

### git mmdetection
RUN git clone --depth=1 https://github.com/open-mmlab/mmdetection.git /root/space/mmdetection

### install mmdetection
RUN cd /root/space/mmdetection &&\ 
    pip3 install -r requirements.txt &&\
    python3 setup.py develop

## install cmake - amirstan plugin below requires cmake version > 3.13
RUN cd /root/space/ &&\
    wget https://github.com/Kitware/CMake/releases/download/v3.19.1/cmake-3.19.1.tar.gz &&\
    tar -xf cmake-3.19.1.tar.gz &&\
    cd cmake-3.19.1 &&\
    apt-get install -y libssl-dev &&\
    ./configure &&\
    make &&\
    make install

### git amirstan plugin
RUN git clone --depth=1 https://github.com/grimoire/amirstan_plugin.git /root/space/amirstan_plugin &&\ 
    cd /root/space/amirstan_plugin &&\ 
    git submodule update --init --progress --depth=1

### install amirstan plugin
RUN cd /root/space/amirstan_plugin &&\ 
    mkdir build &&\
    cd build &&\
    cmake .. &&\
    make -j10 &&\
    echo "export AMIRSTAN_LIBRARY_PATH=/root/space/amirstan_plugin/build/lib" >> /root/.bashrc

### git torch2trt_dynamic
RUN git clone --depth=1 https://github.com/grimoire/torch2trt_dynamic.git /root/space/torch2trt_dynamic

### install torch2trt_dynamic
RUN cd /root/space/torch2trt_dynamic &&\
    python3 setup.py develop

### git mmdetection-to-tensorrt
RUN git clone --depth=1 https://github.com/grimoire/mmdetection-to-tensorrt.git /root/space/mmdetection-to-tensorrt

### install mmdetection-to-tensorrt
RUN cd /root/space/mmdetection-to-tensorrt &&\
    python3 setup.py develop

## setuptools for python3
RUN apt-get install -y python3-setuptools

### install torchvision
RUN  apt-get install -y libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev &&\
     git clone --branch v0.7.0 https://github.com/pytorch/vision torchvision &&\
     cd torchvision &&\
     export BUILD_VERSION=0.7.0 &&\  
     python3 setup.py install

WORKDIR /root/space
grimoire commented 3 years ago

Thanks for the report. I am trying to fix it. Might take some days. Please be patient.

prakashjayy commented 3 years ago

Thanks @grimoire . I am able to successfully execute after commenting this line. Not sure if .engine file would suffice to deploy the model on deepstream. still testing it.

prakashjayy commented 3 years ago

@grimoire any update?

grimoire commented 3 years ago

I found model with hourglass(2 stack, such as cornernet) also have this problem. But still don't know the reason. Sorry. There is a new PR about cpp example. Plan to test engine on it.

grimoire commented 3 years ago

Model saving failed on 2070s, but success on 2080ti. Might related to gpu memory size. But still don't know why. Have you try convert without docker?

prakashjayy commented 3 years ago

closing this issue for now.

xarauzo commented 2 years ago

I have a similar issue, @prakashjayy where did you set the--save-engine truesetting?