gpu_mask_voting not working, while cpu_mask_voting works

BrianOn99 commented 7 years ago

I have built mxnet with gpu support to run FCIS. Running python ./fcis/demo.py, the network run successfully on the gpu within a second, but the resulting image is just the original ones without masks.

By looking at the content and shape of various numpy arrays inside ./fcis/demo.py, I found that every thing looks sane (I mean there is some numbers like 9.96572733e-01 inside the arrays) until gpu_mask_voting, which is:

(Pdb) result_masks
[[], array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21
, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([
], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype
=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0,
 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), a
rray([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21),
 dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], sha
pe=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float
32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21
, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([
], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype
=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0, 1, 21, 21), dtype=float32), array([], shape=(0,
 1, 21, 21), dtype=float32)]

All the arrays are empty. Then I found there is a variant called cpu_mask_voting, so I just plug it in and then the masks are shown on images, and result_masks is not empty anymore.

             boxes = clip_boxes(boxes[0], (im_height, im_width))
-            result_masks, result_dets = gpu_mask_voting(masks, boxes, scores[0], num_classes,
+
+            result_masks, result_dets = cpu_mask_voting(masks, boxes, scores[0], num_classes,
                                                         100, im_width, im_height,
                                                         config.TEST.NMS, config.TEST.MASK_MERGE_THRESH,
-                                                        config.BINARY_THRESH, ctx_id[0])
+                                                        config.BINARY_THRESH)

             dets = [result_dets[j] for j in range(1, num_classes)]

Does it need any extra config to run gpu_mask_voting, and may I know how big is the performance gain by using it?

I am running FCIS master branch with the suggested commit of mxnet, on a amazon aws gpu instance, running ubuntu linux 16.04 and cudnn 5.

Thanks in advance.

liyi14 commented 7 years ago

Hi @BrianOn99 , could you please offer more details about your environment, as I've no idea what happened yet.

BrianOn99 commented 7 years ago

@liyi14 The environment is: amazon aws g2.2xlarge, ubuntu 16.04, cuda8.0.61-1_amd64, within nvidia-docker image "8.0-cudnn5-devel-ubuntu16.04", Python 2.7.12, MXNet@(commit 62ecb60)

HaozhiQi commented 7 years ago

We haven't been able to reproduce this problem on our machines (stated in README). @BrianOn99 GPU mask voting don't need extra config and the performance of gpu/cpu mask voting is nearly the same.

We will keep an eye on this issue. If anyone encounters the same problem, we can then discuss here.

BrianOn99 commented 7 years ago

Thanks for looking into this issue, anyway.

lnuchiyo commented 7 years ago

@BrianOn99 have you meet this error,compile mxnet there are errors: no matching function for call to ‘std::vector<unsigned int*>::push_back

like this: g++ -std=c++11 -c -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -I/home/cs/mxnet/mshadow/ -I/home/cs/mxnet/dmlc-core/include -fPIC -I/home/cs/mxnet/nnvm/include -Iinclude -funroll-loops -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_OPENCV=1 -I/usr/include/opencv -fopenmp -DMSHADOW_USE_CUDNN=1 -I/home/cs/mxnet/cub -DMXNET_USE_NVRTC=0 -MMD -c src/operator/custom/ndarray_op.cc -o build/src/operator/custom/ndarray_op.o In file included from src/operator/custom/native_op.cc:7:0: src/operator/custom/./native_op-inl.h: In member function ‘virtual bool mxnet::op::NativeOpProp::InferShape(std::vectornnvm::TShape, std::vectornnvm::TShape, std::vectornnvm::TShape) const’: src/operator/custom/./native_op-inl.h:204:36: error: no matching function for call to ‘std::vector::push_back(nnvm::dim_t)’ shapes.push_back(iter->data());

can you help me ? i run FCIS as same as you,thanks a lot! can we deal with some question about FCIS'demo.py with email?

BrianOn99 commented 7 years ago

Hi @lnuchiyo, I guess one of your library is incompatible, but not sure which one. If it helps, use the following Dockerfile that I was using:

# This Dockerfile differs from the official one by (at the time of writting)
# 1. Use cudnn6, which provide support for dilated convolution.  Otherwise mxnet
# emites warnings.
# 2. Use 0.10.0 branch
# 3. Removed unnecessary dependencies

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04

ARG nproc=7
WORKDIR /

RUN apt-get update && apt-get install --no-install-recommends -y \
        build-essential libopenblas-dev libopencv-dev git \
        python-dev python3-dev python-pip python3-pip python-setuptools python3-setuptools \
        && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

RUN pip --no-cache-dir install numpy; pip3 --no-cache-dir install numpy

RUN git clone https://github.com/dmlc/mxnet && cd mxnet && \
        git checkout v0.10.0 && \
        git submodule update --init --recursive && \
        make -j ${nproc} USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 && \
        cd python && python setup.py install && python3 setup.py install && \
cd / && rm -r mxnet

I used cudnn6, but cudnn5 was OK, but not sure as it was written 2 months ago.

I am not very familiar to mxnet nor FCIS (I usually work with torch), so contacting the mxnet / FCIS community will definitely be a better solution for you, and you will get more professional response. Anyway feel free to contact me in email: chiu6700@gmail.com.

msracver / FCIS

gpu_mask_voting not working, while cpu_mask_voting works #21