Open yanivge1 opened 4 years ago
It turns out this issue appears in the CPU mode, while we did not implement the C++ code.
@HuiZeng
When run demo_eval.py
, is nn.Dataparallel
necessary?
I commented out some code as fellows:
if args.cuda:
# net = torch.nn.DataParallel(net, device_ids=[0])
cudnn.benchmark = True
net = net.cuda()
Then the error message generated (at: net(img, rois)):
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCReduceAll.cuh:327
Do you know where the problem is?
I am currently modifying your method to generate rois of any specified aspect ratio, and I want to finally add the cpu version of roi/rod.
Thanks a lot
You can try this version if you want to use the CPU mode. https://github.com/HuiZeng/Grid-Anchor-based-Image-Cropping-Pytorch
@HuiZeng
When run
demo_eval.py
, isnn.Dataparallel
necessary? I commented out some code as fellows:if args.cuda: # net = torch.nn.DataParallel(net, device_ids=[0]) cudnn.benchmark = True net = net.cuda()
Then the error message generated (at: net(img, rois)):
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCReduceAll.cuh:327
Do you know where the problem is?
I am currently modifying your method to generate rois of any specified aspect ratio, and I want to finally add the cpu version of roi/rod.
Thanks a lot
Hello, how many GPU cards are available? More than 1? If yes, could you please only use 1 GPU card first? I'm not pretty sure if the problem is caused by accessing memory allocated at different device in our .cu code. Try "export CUDA_VISIBLE_DEVICES=0" before you run the code. I'm sure nn.DataParallel or nn.DistribubutedDataParallel is NOT required in our current implementation.
For CPU implementation, it could be possible to change the setup code to compile C++ implementation. The C++ code is only for reference. We don't take a thorough test of the C++ implementation as it definitely takes lots of time for training:(
@HuiZeng @lld533 Helllo, I just use CPU for inference:), because the pre-trained model is perfect. I am designing an intelligent cropping project based on your model and method. The purpose is to generate a specified proportion of cropping results according to the needs of the editor. So I changed the roi generation method and added face detection.
Currently, I want to encapsulate it as a module, for both CPU and GPU users.
I'm confused about nn.DataParallel
. In fact, it works well with nn.DataParallel
in my PC(1 GPU card) both in CPU/GPU. But when comment that line, the error occurs. This error is very confusing.... I am a beginner of cuda and cannot locate the cause of the error.
def test():
for epoch in range(0, 1):
net = build_crop_model(scale='multi', # scale='single',
alignsize=9, reddim=8, loadweight=False, model='mobilenetv2', downsample=4)
net.load_state_dict(torch.load(args.net_path))
net.eval()
if args.cuda:
# I comment that line
# net = torch.nn.DataParallel(net, device_ids=[0])
cudnn.benchmark = True
net = net.cuda()
data_loader = data.DataLoader(dataset, args.batch_size,
num_workers=args.num_workers,
collate_fn=naive_collate,
shuffle=False)
for id, sample in enumerate(data_loader):
imgpath = sample['imgpath']
image = sample['image']
bboxes = sample['sourceboxes']
resized_image = sample['resized_image']
tbboxes = sample['tbboxes']
if len(tbboxes['xmin']) == 0:
continue
roi = []
Can I contact you by mail? My E-mail: lih627@outlook.com
Thanks.
i've Built and installed source code of roi_align_api and rod_align_api. but while running roi_align.py -> roi_align_api.forward(), outputs are all zeros. all params are default, i didn't change anything. aligned_height : 10 aligned_width: 10 spatial_scale: 0.0625 features.shape: torch.Size([1, 8, 16, 24]) features.dtype: torch.float32 rois.shape: torch.Size([83, 5]) rois.dtype: torch.float32 rois tensor([[ 0.0000, 16.0000, 10.6667, 304.0000, 181.3333], [ 0.0000, 16.0000, 10.6667, 336.0000, 181.3333], [ 0.0000, 16.0000, 10.6667, 272.0000, 202.6667], ...]
features: ..., [ 0.7069, 0.1061, 0.2352, ..., 0.2554, 0.2490, -0.6246], [-0.3563, 0.0466, -0.4040, ..., 0.3803, -0.1317, -1.2327], [-1.2773, -0.5382, -0.9917, ..., -0.6372, -0.9664, -2.5898]]]], grad_fn=MkldnnConvolutionBackward)