minar09 / cp-vton-plus

Official implementation for "CP-VTON+: Clothing Shape and Texture Preserving Image-Based Virtual Try-On", CVPRW 2020
https://minar09.github.io/cpvtonplus/
MIT License
346 stars 120 forks source link

Error while testing with custom images #27

Closed karthik1997 closed 3 years ago

karthik1997 commented 3 years ago

I really appreciate your work.

i am getting an error like this while testing custom images,

return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Given groups=1, weight of size [64, 22, 4, 4], expected input[3, 28, 256, 192] to have 22 channels, but got 28 channels instead

minar09 commented 3 years ago

Hi @karthik1997 , looks like you have input mismatch. Please read the readme file for testing custom images and try accordingly. If the error still persists, then please comment with your inputs with their channels/dimensions.

karthik1997 commented 3 years ago

The inputs are according to the instructions. I am sharing the images here: 000118_1 000118_1 abc abc abc

@minar09

minar09 commented 3 years ago

Hi @karthik1997 , its very difficult to understand the issue just by looking at the images. Your input images seem okay, but I don't know their input channels. Please check the inputs with their dimensions. For example, segmentation input will be the [0,20] grayscale input, not like the one showed here (RGB). I think you can find the origin of the issue with a little debugging. Good luck.

minar09 commented 3 years ago

If its still giving error, you can comment here with the full error traceback. That way it will be easier to understand. Thank you.

karthik1997 commented 3 years ago

@minar09 I used graphonomy for segmentation , will that be an issue? The input image is RGB and output image is the segmentation image which I posted. Converted that 24bit image into 8 bit according to the prerequisite.

minar09 commented 3 years ago

What is the label numbers of Graphonomy? Please check if its similar to LIP/PGN. Also, there should be a generated grayscale [0,20] output segmentation file, you don't need to convert from RGB. See the difference below or check the VITON dataset segmentation files.

000038_0 000038_0_vis

karthik1997 commented 3 years ago

@minar09 hi , Thank you for the help. the segmentation labels are same in graphonomy too. I tried with them also adding the images here: abc abc

attaching the entire traceback here:

Traceback (most recent call last): File "test.py", line 225, in main() File "test.py", line 211, in main test_gmm(opt, test_loader, model, board) File "test.py", line 99, in test_gmm grid, theta = model(agnostic, cm) File "C:\Users\Karthik\anaconda3\envs\torch_gpu\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "C:\Users\Karthik\cp-vton-plus-master\networks.py", line 518, in forward featureA = self.extractionA(inputA) File "C:\Users\Karthik\anaconda3\envs\torch_gpu\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "C:\Users\Karthik\cp-vton-plus-master\networks.py", line 79, in forward return self.model(x) File "C:\Users\Karthik\anaconda3\envs\torch_gpu\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "C:\Users\Karthik\anaconda3\envs\torch_gpu\lib\site-packages\torch\nn\modules\container.py", line 117, in forward input = module(input) File "C:\Users\Karthik\anaconda3\envs\torch_gpu\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "C:\Users\Karthik\anaconda3\envs\torch_gpu\lib\site-packages\torch\nn\modules\conv.py", line 419, in forward return self._conv_forward(input, self.weight) File "C:\Users\Karthik\anaconda3\envs\torch_gpu\lib\site-packages\torch\nn\modules\conv.py", line 415, in _conv_forward return F.conv2d(input, weight, self.bias, self.stride, RuntimeError: Given groups=1, weight of size [64, 22, 4, 4], expected input[1, 28, 256, 192] to have 22 channels, but got 28 channels instead

thaithanhtuan commented 3 years ago

Based on the traceback, the error comes from the size of agnostic. It should be 22 channels. But the agnostic of your custom image return 28 channels. You can check the size of these data: image by debugging cp_dataset.py line 178. (print out the shape of these inputs) Check: shape: 1 channel; im_h: 3 channels; pose_map: 18 channels. Maybe one of your inputs got the wrong size. Regards.

karthik1997 commented 3 years ago

@thaithanhtuan @minar09 ,

I found the error. I am getting the shape of pose_map as 24 instead of 18. How can i solve this issue?

karthik1997 commented 3 years ago

@thaithanhtuan @minar09 ,

i used the openpose pytorch implementation for generating the keypoints.

https://github.com/minar09/openpose-pytorch

minar09 commented 3 years ago

@karthik1997 , use the coco-18 model from the original openpose repository.

karthik1997 commented 3 years ago

Ok @minar09 ,

Thank you. Will try that out and post the try on results here.

One more question... Is it possible to try on for bottomwears also with this algorithm?

minar09 commented 3 years ago

for different apparel than upper cloth, you can follow a similar procedure. You may need datasets and training separate models for that.

karthik1997 commented 3 years ago

@minar09 , Thank you, so try on for bottomwear and topwear is not possible with this algorithm even if we can get the data for training the same? do we have to go for a different approach?

minar09 commented 3 years ago

@karthik1997 , not sure. This is an active research area with many challenges. You can explore the latest various research works or try your own approach.

thaithanhtuan commented 3 years ago

@karthik1997 from 24 keypoints, you can check what they are and try to remove 6 from 24 to get 18 keypoints. Or change the input of GMM from 22 channels to 28 channels. About bottom wear, please define the problem, what is inputs and desired ouput?

karthik1997 commented 3 years ago

@thaithanhtuan ,

How can I generate the same output with 28 channels?

Regarding the bottomear, I've data for desired person's photo, the required topwear and required bottomwear photo. I need to try on the top and bottom wear on the desired person's image.

If I have a good amount of data like this can I achieve the Tryon results of top and bottom wear with the same code?

karthik1997 commented 3 years ago

@minar09 @thaithanhtuan ,

getting the same error after extracting the pose with coco-18 model. Getting 24 keypoints instead of 18 keypoints with coco-18 models also.

karthik1997 commented 3 years ago

@minar09 @thaithanhtuan ,

can you provide me the link to actual coco-18 model file?

minar09 commented 3 years ago

Openpose original repo: https://github.com/CMU-Perceptual-Computing-Lab/openpose How to generate joints: https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/demo_overview.md Release: https://github.com/CMU-Perceptual-Computing-Lab/openpose/releases/tag/v1.6.0 Download models: https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/models/getModels.bat or https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/models/getModels.sh

If you have issues running with openpose, you should refer to the original repository.

karthik1997 commented 3 years ago

Thank You