j96w / DenseFusion

"DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion" code repository
https://sites.google.com/view/densefusion
MIT License
1.1k stars 300 forks source link

Sementation on LineMOD dataset #72

Closed sanjaysswami closed 5 years ago

sanjaysswami commented 5 years ago

I am trying to train segmentation network on linemod dataset. I have got follwoing error.. I have changed the path (in train.py and data_controller.py) necessary to import linemod rgb and mask files, as well as final class indices as 14 instead of 22 wherever needed in loss.py script and segnet.py.

Here is the error:

5000 1000 /usr/local/lib/python3.5/dist-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead. warnings.warn(warning.format(ret)) 2019-08-08 14:20:55,080 : Train time 00h 00m 00s, Training started Traceback (most recent call last): File "train.py", line 74, in semantic_loss = criterion(semantic, target) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/ros/Object_Pose_Estimation/DenseFusion/vanilla_segmentation/loss.py", line 35, in forward return loss_calculation(semantic, target) File "/home/ros/Object_Pose_Estimation/DenseFusion/vanilla_segmentation/loss.py", line 24, in loss_calculation semantic_loss = CEloss(semantic, target) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/loss.py", line 904, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 1970, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 1788, in nll_loss .format(input.size(0), target.size(0))) ValueError: Expected input batch_size (921600) to match target batch_size (2764800).**

hygxy commented 5 years ago

@sanjaysswami same issue here, i made my own dataset which consists of only 4 classes, I got the similar problem except the last line :

sanjaysswami commented 5 years ago

@hygxy if you find any solution please let me know. Thank you in advance

hygxy commented 5 years ago

@sanjaysswami Adding convert("L") to label might help, i.e: change this line to the following:

But after that i got another problem:

/opt/conda/conda-bld/pytorch_1550796191843/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed. RuntimeError: CUDA error: device-side assert triggered

sanjaysswami commented 5 years ago

@hygxy tried as you did. Got same error. Trying to resolve it now.

sanjaysswami commented 5 years ago

@hygxy After that step you need to normalize the label with range of 0-1 pixels. For me now its working

hygxy commented 5 years ago

@sanjaysswami Could you please also show me the normalization code?

TrinhNC commented 5 years ago

Using this formula: (x - x.min()) / (x.max() - x.min()) # values from 0 to 1 with x is the label

hygxy commented 5 years ago

@TrinhTUHH, thanks for your advice, it's working on my own dataset now. @sanjaysswami I am wondering why the author didn't do the normalization step and the code works too

TrinhNC commented 5 years ago

when you convert label image to grayscale, the maximum value is 255 (for ycb it is 21), so my guess is that through the convolution layers, the value reach too big that cuda cannot handle.

hygxy commented 5 years ago

@TrinhTUHH , if that's the reason, why do we need to normalize all pixels to (0,1) instead of to (0,13) in for example @sanjaysswami 's case?

TrinhNC commented 5 years ago

Hi, sorry, I think my suggestion is wrong because I didn't understand the label image the right way.. After reading this issue, I think the normalization should not be the case here. So in a mask (or label) image, the pixels in background will be marked as one value (in ycb they are 0), the pixels belong to one class are marked with number the same as the order or rank of the object (for example: 8 is the gelatin box, 21 is foam_brick, 14 is mug, etc.). In Linemod, all mask images except the images in the folder 02, have pixels of 2 values 0 and 255, (0 is black for background and 255 as white for object), meaning only one object is segmented in one frame. But the mask images in folder 02, all objects are segmented and labeled. However there are 22 labeled values which exceed number of objects in Linemod (more details in this issue). And that might be the real problem causing this error.

Anyway, I can only check about it tomorrow.

hygxy commented 5 years ago

@TrinhTUHH , we have two problems here:

That's basically my understanding, I just can't figure that why the creator of this issue did a (0,1) normalization instead of a (0, number of classes) normalization, the latter works also. @sanjaysswami Any explanations?

sanjaysswami commented 5 years ago

@hygxy I and @TrinhTUHH works together. Today we will check and get back to you.

TrinhNC commented 5 years ago

@hygxy Normalization should not be used here. You can simply replace the pixels which have value of 255 by the object_id in all label varibales in data_controller.py. After that train the network.

hygxy commented 5 years ago

@ But after convert("L") I have no pixel values 255 any more. Could you please also given me an example?

TrinhNC commented 5 years ago

For me the white region of label image still has pixels of 255 after convert('L'). If yours does not have, then just replace the maximum values by object id and keep the pixels in background 0.

hygxy commented 5 years ago

I see, so actually we are normalizing the pixel values to (0, number of classes) as I mentioned before?

marmas92 commented 4 years ago

Hi, I try to train the SegNet with my own synthetic dataset in the structure of the preporcessed LINEMOD dataset. Maybe my question is a bit stupid, but how does it work? I want the Segnet to be trained so it gets me the output of only the object I am currently searching in the picture for (Like the mask images in the preprocessed LINEMOD dataset). But if I understand it right The SegNet would output the mask for every known object in the picture, right?