roi_align_api.forward outputs are all zeros

yanivge1 commented 4 years ago

i've Built and installed source code of roi_align_api and rod_align_api. but while running roi_align.py -> roi_align_api.forward(), outputs are all zeros. all params are default, i didn't change anything. aligned_height : 10 aligned_width: 10 spatial_scale: 0.0625 features.shape: torch.Size([1, 8, 16, 24]) features.dtype: torch.float32 rois.shape: torch.Size([83, 5]) rois.dtype: torch.float32 rois tensor([[ 0.0000, 16.0000, 10.6667, 304.0000, 181.3333], [ 0.0000, 16.0000, 10.6667, 336.0000, 181.3333], [ 0.0000, 16.0000, 10.6667, 272.0000, 202.6667], ...]

features: ..., [ 0.7069, 0.1061, 0.2352, ..., 0.2554, 0.2490, -0.6246], [-0.3563, 0.0466, -0.4040, ..., 0.3803, -0.1317, -1.2327], [-1.2773, -0.5382, -0.9917, ..., -0.6372, -0.9664, -2.5898]]]], grad_fn=MkldnnConvolutionBackward)

HuiZeng commented 4 years ago

It turns out this issue appears in the CPU mode, while we did not implement the C++ code.

lih627 commented 4 years ago

@HuiZeng

When run demo_eval.py, is nn.Dataparallel necessary? I commented out some code as fellows:

 if args.cuda:
            # net = torch.nn.DataParallel(net, device_ids=[0])
            cudnn.benchmark = True
            net = net.cuda()

Then the error message generated (at: net(img, rois)):

RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCReduceAll.cuh:327

Do you know where the problem is?

I am currently modifying your method to generate rois of any specified aspect ratio, and I want to finally add the cpu version of roi/rod.

Thanks a lot

HuiZeng commented 4 years ago

You can try this version if you want to use the CPU mode. https://github.com/HuiZeng/Grid-Anchor-based-Image-Cropping-Pytorch

lld533 commented 4 years ago

@HuiZeng

When run demo_eval.py, is nn.Dataparallel necessary? I commented out some code as fellows:
 if args.cuda:
            # net = torch.nn.DataParallel(net, device_ids=[0])
            cudnn.benchmark = True
            net = net.cuda()
Then the error message generated (at: net(img, rois)):
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCReduceAll.cuh:327
Do you know where the problem is?

I am currently modifying your method to generate rois of any specified aspect ratio, and I want to finally add the cpu version of roi/rod.

Thanks a lot

Hello, how many GPU cards are available? More than 1? If yes, could you please only use 1 GPU card first? I'm not pretty sure if the problem is caused by accessing memory allocated at different device in our .cu code. Try "export CUDA_VISIBLE_DEVICES=0" before you run the code. I'm sure nn.DataParallel or nn.DistribubutedDataParallel is NOT required in our current implementation.

For CPU implementation, it could be possible to change the setup code to compile C++ implementation. The C++ code is only for reference. We don't take a thorough test of the C++ implementation as it definitely takes lots of time for training:(

lih627 commented 4 years ago

@HuiZeng @lld533 Helllo, I just use CPU for inference:), because the pre-trained model is perfect. I am designing an intelligent cropping project based on your model and method. The purpose is to generate a specified proportion of cropping results according to the needs of the editor. So I changed the roi generation method and added face detection.

Currently, I want to encapsulate it as a module, for both CPU and GPU users.

I'm confused about nn.DataParallel. In fact, it works well with nn.DataParallel in my PC(1 GPU card) both in CPU/GPU. But when comment that line, the error occurs. This error is very confusing.... I am a beginner of cuda and cannot locate the cause of the error.


def test():
    for epoch in range(0, 1):

        net = build_crop_model(scale='multi',  # scale='single',
                               alignsize=9, reddim=8, loadweight=False, model='mobilenetv2', downsample=4)
        net.load_state_dict(torch.load(args.net_path))
        net.eval()

        if args.cuda:
            # I comment that line
            #  net = torch.nn.DataParallel(net, device_ids=[0])
            cudnn.benchmark = True 
            net = net.cuda()

        data_loader = data.DataLoader(dataset, args.batch_size,
                                      num_workers=args.num_workers,
                                      collate_fn=naive_collate,
                                      shuffle=False)

        for id, sample in enumerate(data_loader):
            imgpath = sample['imgpath']
            image = sample['image']
            bboxes = sample['sourceboxes']
            resized_image = sample['resized_image']
            tbboxes = sample['tbboxes']

            if len(tbboxes['xmin']) == 0:
                continue

            roi = []

Can I contact you by mail? My E-mail: lih627@outlook.com

Thanks.

lld533 / Grid-Anchor-based-Image-Cropping-Pytorch

roi_align_api.forward outputs are all zeros #11