lujiazho commented 1 year ago

After investigating for 3 days, I got almost all done except for some minor problems. Here is a link of personal made study case of HR-VITON

Pre

According to explanation from authors: Preprocessing.md. At least a few steps are needed for getting all required inputs of model.

OpenPose
Human Parse
DensePose
Cloth Mask
Parse Agnostic
Human Agnostic

Most of those are reproduced on Colab, except Human Parse, which needs Tensorflow 1.15 and GPU is highly prefered.

1、OpenPose（On colab, need GPU）

(1) Install OpenPose, taking about 15 minutes

import os
from os.path import exists, join, basename, splitext

git_repo_url = 'https://github.com/CMU-Perceptual-Computing-Lab/openpose.git'
project_name = splitext(basename(git_repo_url))[0]
if not exists(project_name):
  # see: https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/949
  # install new CMake becaue of CUDA10
  !wget -q https://cmake.org/files/v3.13/cmake-3.13.0-Linux-x86_64.tar.gz
  !tar xfz cmake-3.13.0-Linux-x86_64.tar.gz --strip-components=1 -C /usr/local
  # clone openpose
  !git clone -q --depth 1 $git_repo_url
  !sed -i 's/execute_process(COMMAND git checkout master WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}\/3rdparty\/caffe)/execute_process(COMMAND git checkout f019d0dfe86f49d1140961f8c7dec22130c83154 WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}\/3rdparty\/caffe)/g' openpose/CMakeLists.txt
  # install system dependencies
  !apt-get -qq install -y libatlas-base-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev opencl-headers ocl-icd-opencl-dev libviennacl-dev
  # install python dependencies
  !pip install -q youtube-dl
  # build openpose
  !cd openpose && rm -rf build || true && mkdir build && cd build && cmake .. && make -j`nproc`

Now, OpenPose will be installed under your current path.

(2) Get all needed models

!. ./openpose/models/getModels.sh

(3) Prepare your test data

# for storing input image
!mkdir ./image_path
# copy official provided data to image_path, you may need to download and unzip it in advance
!cp ./test/image/000* ./image_path/
# create directories for generated results of OpenPose
!mkdir ./json_path
!mkdir ./img_path

（4）Run

# go to openpose directory
%cd openpose
# run openpose.bin
!./build/examples/openpose/openpose.bin --image_dir ../image_path --hand --disable_blending --display 0 --write_json ../json_path --write_images ../img_path --num_gpu 1 --num_gpu_start 0

Then json files will be saved under ../json_path and images will be saved under ../img_path.

The image result looks like

More details about results can be found at openpose

2、Human Parse

In this section, you can either do it on Colab, Cloud, or local. Unfortunately, I didn't successfully make use of GPU on Colab, and I can only use CPU, which is super slow when image size at 768 × 1024 (about 13 minutes per image).

Method 1: Colab

If you can accept, then install Tensorflow 1.15, before which you have to change Python version to 3.7 or 3.6.

(1) Get pretrained model

%%bash
FILE_NAME='./CIHP_pgn.zip'
FILE_ID='1Mqpse5Gen4V4403wFEpv3w3JAsWw2uhk'

curl -sc /tmp/cookie "https://drive.google.com/uc?export=download&id=$FILE_ID" > /dev/null
CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"  
curl -Lb /tmp/cookie "https://drive.google.com/uc?export=download&confirm=${CODE}&id=$FILE_ID" -o $FILE_NAME

unzip

!unzip CIHP_pgn.zip

(2) Get repo

!cp -r /content/drive/MyDrive/CIHP_PGN ./
%cd CIHP_PGN

Note: I just saved the repo and cleaned it for my own purpose, but you can use official provided code as well.

(3) Prepare data and model

!mkdir -p ./checkpoint
!mkdir -p ./datasets/images
# You also need to download dataset provided or use your own images
!mv ../CIHP_pgn ./checkpoint/CIHP_pgn
!cp ../test/image/0000* ./datasets/images

(4) Configuration

Change to Python 3.6

!sudo update-alternatives --config python3

Install dependencies (Tensorflow 1.15)

!sudo apt-get install python3-pip
!python -m pip install --upgrade pip
!pip install matplotlib opencv-python==4.2.0.32 Pillow scipy tensorflow==1.15
!pip install ipykernel

(5) Run

now you can run your code

!python ./inference_pgn.py

Note: In official repo, the file is named inf_pgn.py, which leads to the same result as mine.

Finally, you can get result looks like

More details can be found at CIHP_PGN

Method 2: Local or Server

In this section, I will give more explanation about what we really need.

You need conda in this part, which is what I used at least.

(1) Create a new env for oldschool Tensorflow

conda create -n tf python=3.7

(2) Configuration

conda activate tf

install GPU dependencies: cudatoolkit=10.0 cudnn=7.6.5

conda install -c conda-forge cudatoolkit=10.0 cudnn=7.6.5

install Tensorflow 1.15 GPU

pip install tensorflow-gpu==1.15

You may need to install below in a new env

pip install scipy==1.7.3 opencv-python==4.5.5.62 protobuf==3.19.1 Pillow==9.0.1 matplotlib==3.5.1

More info about compatibility between Tensorflow and CUDA can be found here

(3) Prepare data, repo and model as mentioned before

A final dir looks like

So you basically just put model under checkpoint/CIHP_pgn

And put data under datasets/images

It can be just a few images of people. A repo of my cleaned version can be found at Google Drive. Feel free to download it. If you use official provided inf_pgn.py, same results will be generated.

(4) Run

python inference_pgn.py

Then you should see the output. Unfortunately, I didn't make it inference with GPU, no matter on server or local.

At local, my GPU is MX250 with 2G memory, which is not enough for inference. At server, the GPU is RTX A5000, but for some unknown reason, probably something incompatible, the GPU is not invoked for inference. But model is successfully loaded into GPU though.

Fortunately, the server I used has 24 Cores and supports 2 threads per Core, which make it running still fast (20 to 30 seconds per 768×1024 image) even with CPU.

Final result looks like

However, the result inferenced with input of 768×1024 is not the same as input of 192×256. The former looks worse as shown above.

Note: The black images are what we really need, because the values of colored one are for example 0, 51, 85, 128, 170, 221, 255, which are not from 0 - 20 and inconsistant with HR-VITON. The values of black one are for example 0, 2, 5, 10, 12, 13, 14, 15, which are needed as labels for getting agnostic images.

One thing to mention, the images provided by official dataset keep both visualization (colored) and label (0 - 20). I don't know how they did that. I also tried P mode in PIL, but found nothing.

3、DensePose (On colab, GPU or CPU)

(1) get repo of detectron2

!git clone https://github.com/facebookresearch/detectron2

(2) install dependencies

!python -m pip install -e detectron2

(3) install packages for DensePose

%cd detectron2/projects/DensePose
!pip install av>=8.0.3 opencv-python-headless>=4.5.3.56 scipy>=1.5.4

(4) Prepare your images

!mkdir ./image_path
!cp /content/test/image/0000* ./image_path/

(5) Modify code

At the time I used DensePose, there are some bugs, I have to modify some code to make it work as I want it to. When you follow this tutorial, situation may change.

For getting same input as HR-VITON, change ./densepose/vis/densepose_results.py in line 320
```
alpha=0.7 to 1
inplace=True to False
```
change ./densepose/vis/base.py, line 38

This modification is because above change is not enough, image_target_bgr = image_bgr 0* made a copy instead of a reference and lost our result.

image_target_bgr = image_bgr * 0
to 
image_target_bgr = image_bgr
image_target_bgr *= 0

To save file with name kept and in directory, change apply_net.py, line 286 and 287 to below
```
out_fname = './image-densepose/' + image_fpath.split('/')[-1]
out_dir = './image-densepose'
```

(6) Run

If you are using CPU, add --opts MODEL.DEVICE cpu to end of below command.

!python apply_net.py show configs/densepose_rcnn_R_50_FPN_s1x.yaml \
https://dl.fbaipublicfiles.com/densepose/densepose_rcnn_R_50_FPN_s1x/165712039/model_final_162be9.pkl \
image_path dp_segm -v

Then you can get results look like

4、Cloth Mask (On colab, GPU or CPU)

This is a lot easier.

(1) Install

!pip install carvekit_colab

(2) Download models

from carvekit.ml.files.models_loc import download_all
download_all();

(3) Prepare cloth images

!mkdir ./cloth
!cp ./test/cloth/0000* ./cloth/

prepare dir for results

!mkdir ./cloth_mask

(4) Run

#title Upload images from your computer
#markdown Description of parameters
#markdown - `SHOW_FULLSIZE`  - Shows image in full size (may take a long time to load)
#markdown - `PREPROCESSING_METHOD`  - Preprocessing method
#markdown - `SEGMENTATION_NETWORK`  - Segmentation network. Use `u2net` for hairs-like objects and `tracer_b7` for objects
#markdown - `POSTPROCESSING_METHOD`  - Postprocessing method
#markdown - `SEGMENTATION_MASK_SIZE` - Segmentation mask size. Use 640 for Tracer B7 and 320 for U2Net
#markdown - `TRIMAP_DILATION`  - The size of the offset radius from the object mask in pixels when forming an unknown area
#markdown - `TRIMAP_EROSION`  - The number of iterations of erosion that the object's mask will be subjected to before forming an unknown area

import os
import numpy as np
from PIL import Image, ImageOps
from carvekit.web.schemas.config import MLConfig
from carvekit.web.utils.init_utils import init_interface

SHOW_FULLSIZE = False #param {type:"boolean"}
PREPROCESSING_METHOD = "none" #param ["stub", "none"]
SEGMENTATION_NETWORK = "tracer_b7" #param ["u2net", "deeplabv3", "basnet", "tracer_b7"]
POSTPROCESSING_METHOD = "fba" #param ["fba", "none"] 
SEGMENTATION_MASK_SIZE = 640 #param ["640", "320"] {type:"raw", allow-input: true}
TRIMAP_DILATION = 30 #param {type:"integer"}
TRIMAP_EROSION = 5 #param {type:"integer"}
DEVICE = 'cpu' # 'cuda'

config = MLConfig(segmentation_network=SEGMENTATION_NETWORK,
                  preprocessing_method=PREPROCESSING_METHOD,
                  postprocessing_method=POSTPROCESSING_METHOD,
                  seg_mask_size=SEGMENTATION_MASK_SIZE,
                  trimap_dilation=TRIMAP_DILATION,
                  trimap_erosion=TRIMAP_EROSION,
                  device=DEVICE)

interface = init_interface(config)

imgs = []
root = '/content/cloth'
for name in os.listdir(root):
    imgs.append(root + '/' + name)

images = interface(imgs)
for i, im in enumerate(images):
    img = np.array(im)
    img = img[...,:3] # no transparency
    idx = (img[...,0]==0)&(img[...,1]==0)&(img[...,2]==0) # background 0 or 130, just try it
    img = np.ones(idx.shape)*255
    img[idx] = 0
    im = Image.fromarray(np.uint8(img), 'L')
    im.save(f'./cloth_mask/{imgs[i].split("/")[-1].split(".")[0]}.jpg')

Make sure your cloth mask results are the same size with input cloth image (768×1024). And looks like

Note: you may have to change above code to get the right results, because sometimes the generated results are different, and I didn't investigate to much about this tool. Especially the line of idx = (img[...,0]==0)&(img[...,1]==0)&(img[...,2]==0), you may get results of 0 or 130 as background depending on the model you use and settings.

5、Parse Agnostic (On colab)

Here is the parse label and corresponding body parts. You may need or not.

0 - 20
Background
Hat
Hair
Glove
Sunglasses
Upper-clothes
Dress
Coat
Socks
Pants
tosor-skin
Scarf
Skirt
Face
Left-arm
Right-arm
Left-leg
Right-leg
Left-shoe
Right-shoe

(1) Install packages

!pip install Pillow tqdm

(2) Prepare data

After all above steps, now you should have a data structure like this, they are under directory of test. If you are not sure which results locate in which dir, check out official dataset structure, you can download it from here.

You can zip them into test.zip and unzip them on Colab with !unzip test.zip.

Note: the images under image-parse-v3 (black images with label) are not looking the same as official data (colored images with label), the reason has been mentioned before.

(3) Run

import json
from os import path as osp
import os
import numpy as np
from PIL import Image, ImageDraw
from tqdm import tqdm

def get_im_parse_agnostic(im_parse, pose_data, w=768, h=1024):
    label_array = np.array(im_parse)
    parse_upper = ((label_array == 5).astype(np.float32) +
                    (label_array == 6).astype(np.float32) +
                    (label_array == 7).astype(np.float32))
    parse_neck = (label_array == 10).astype(np.float32)

    r = 10
    agnostic = im_parse.copy()

    # mask arms
    for parse_id, pose_ids in [(14, [2, 5, 6, 7]), (15, [5, 2, 3, 4])]:
        mask_arm = Image.new('L', (w, h), 'black')
        mask_arm_draw = ImageDraw.Draw(mask_arm)
        i_prev = pose_ids[0]
        for i in pose_ids[1:]:
            if (pose_data[i_prev, 0] == 0.0 and pose_data[i_prev, 1] == 0.0) or (pose_data[i, 0] == 0.0 and pose_data[i, 1] == 0.0):
                continue
            mask_arm_draw.line([tuple(pose_data[j]) for j in [i_prev, i]], 'white', width=r*10)
            pointx, pointy = pose_data[i]
            radius = r*4 if i == pose_ids[-1] else r*15
            mask_arm_draw.ellipse((pointx-radius, pointy-radius, pointx+radius, pointy+radius), 'white', 'white')
            i_prev = i
        parse_arm = (np.array(mask_arm) / 255) * (label_array == parse_id).astype(np.float32)
        agnostic.paste(0, None, Image.fromarray(np.uint8(parse_arm * 255), 'L'))

    # mask torso & neck
    agnostic.paste(0, None, Image.fromarray(np.uint8(parse_upper * 255), 'L'))
    agnostic.paste(0, None, Image.fromarray(np.uint8(parse_neck * 255), 'L'))

    return agnostic

if __name__ =="__main__":
    data_path = './test'
    output_path = './test/parse'

    os.makedirs(output_path, exist_ok=True)

    for im_name in tqdm(os.listdir(osp.join(data_path, 'image'))):

        # load pose image
        pose_name = im_name.replace('.jpg', '_keypoints.json')

        try:
            with open(osp.join(data_path, 'openpose_json', pose_name), 'r') as f:
                pose_label = json.load(f)
                pose_data = pose_label['people'][0]['pose_keypoints_2d']
                pose_data = np.array(pose_data)
                pose_data = pose_data.reshape((-1, 3))[:, :2]
        except IndexError:
            print(pose_name)
            continue

        # load parsing image
        parse_name = im_name.replace('.jpg', '.png')
        im_parse = Image.open(osp.join(data_path, 'image-parse-v3', parse_name))

        agnostic = get_im_parse_agnostic(im_parse, pose_data)

        agnostic.save(osp.join(output_path, parse_name))

You can check results under ./test/parse. But it's all black as well. To ensure you are getting the right agnostic parse images, do below

import numpy as np
from PIL import Image

im_ori = Image.open('./test/image-parse-v3/06868_00.png')
im = Image.open('./test/parse/06868_00.png')
print(np.unique(np.array(im_ori)))
print(np.unique(np.array(im)))

The output may look like

[ 0  2  5  9 10 13 14 15]
[ 0  2  9 13 14 15]

The first row is longer than the second row.

You can also visualize it by

np_im = np.array(im)
np_im[np_im==2] = 151
np_im[np_im==9] = 178
np_im[np_im==13] = 191
np_im[np_im==14] = 221
np_im[np_im==15] = 246
Image.fromarray(np_im)

result may be like, which is cloth-agnostic

Save all the images under parse to image-parse-agnostic-v3.2

6、Human Agnostic

Steps are almost the same as above section.

(1) install

!pip install Pillow tqdm

(2) Prepare data

Now it looks like

(3) Run

import json
from os import path as osp
import os
import numpy as np
from PIL import Image, ImageDraw
from tqdm import tqdm

def get_img_agnostic(img, parse, pose_data):
    parse_array = np.array(parse)
    parse_head = ((parse_array == 4).astype(np.float32) +
                    (parse_array == 13).astype(np.float32))
    parse_lower = ((parse_array == 9).astype(np.float32) +
                    (parse_array == 12).astype(np.float32) +
                    (parse_array == 16).astype(np.float32) +
                    (parse_array == 17).astype(np.float32) +
                    (parse_array == 18).astype(np.float32) +
                    (parse_array == 19).astype(np.float32))

    agnostic = img.copy()
    agnostic_draw = ImageDraw.Draw(agnostic)

    length_a = np.linalg.norm(pose_data[5] - pose_data[2])
    length_b = np.linalg.norm(pose_data[12] - pose_data[9])
    point = (pose_data[9] + pose_data[12]) / 2
    pose_data[9] = point + (pose_data[9] - point) / length_b * length_a
    pose_data[12] = point + (pose_data[12] - point) / length_b * length_a
    r = int(length_a / 16) + 1

    # mask arms
    agnostic_draw.line([tuple(pose_data[i]) for i in [2, 5]], 'gray', width=r*10)
    for i in [2, 5]:
        pointx, pointy = pose_data[i]
        agnostic_draw.ellipse((pointx-r*5, pointy-r*5, pointx+r*5, pointy+r*5), 'gray', 'gray')
    for i in [3, 4, 6, 7]:
        if (pose_data[i - 1, 0] == 0.0 and pose_data[i - 1, 1] == 0.0) or (pose_data[i, 0] == 0.0 and pose_data[i, 1] == 0.0):
            continue
        agnostic_draw.line([tuple(pose_data[j]) for j in [i - 1, i]], 'gray', width=r*10)
        pointx, pointy = pose_data[i]
        agnostic_draw.ellipse((pointx-r*5, pointy-r*5, pointx+r*5, pointy+r*5), 'gray', 'gray')

    # mask torso
    for i in [9, 12]:
        pointx, pointy = pose_data[i]
        agnostic_draw.ellipse((pointx-r*3, pointy-r*6, pointx+r*3, pointy+r*6), 'gray', 'gray')
    agnostic_draw.line([tuple(pose_data[i]) for i in [2, 9]], 'gray', width=r*6)
    agnostic_draw.line([tuple(pose_data[i]) for i in [5, 12]], 'gray', width=r*6)
    agnostic_draw.line([tuple(pose_data[i]) for i in [9, 12]], 'gray', width=r*12)
    agnostic_draw.polygon([tuple(pose_data[i]) for i in [2, 5, 12, 9]], 'gray', 'gray')

    # mask neck
    pointx, pointy = pose_data[1]
    agnostic_draw.rectangle((pointx-r*7, pointy-r*7, pointx+r*7, pointy+r*7), 'gray', 'gray')
    agnostic.paste(img, None, Image.fromarray(np.uint8(parse_head * 255), 'L'))
    agnostic.paste(img, None, Image.fromarray(np.uint8(parse_lower * 255), 'L'))

    return agnostic

if __name__ =="__main__":
    data_path = './test'
    output_path = './test/parse'

    os.makedirs(output_path, exist_ok=True)

    for im_name in tqdm(os.listdir(osp.join(data_path, 'image'))):

        # load pose image
        pose_name = im_name.replace('.jpg', '_keypoints.json')

        try:
            with open(osp.join(data_path, 'openpose_json', pose_name), 'r') as f:
                pose_label = json.load(f)
                pose_data = pose_label['people'][0]['pose_keypoints_2d']
                pose_data = np.array(pose_data)
                pose_data = pose_data.reshape((-1, 3))[:, :2]
        except IndexError:
            print(pose_name)
            continue

        # load parsing image
        im = Image.open(osp.join(data_path, 'image', im_name))
        label_name = im_name.replace('.jpg', '.png')
        im_label = Image.open(osp.join(data_path, 'image-parse-v3', label_name))

        agnostic = get_img_agnostic(im, im_label, pose_data)

        agnostic.save(osp.join(output_path, im_name))

Results look like

Save them to dir of agnostic-v3.2. Now you are almost done. The final structure of preprocessing results are

7、Conclusion

Thanks for reading. It's not easy to get all this done. Before you run HR-VITON with you preprocessed dataset, note that each person image need a corresponding cloth image even though it's not used while inference. If you don't want this behavior, you can either change the source code manually or just add some random images with the same name of person images. After all done, suppose you are testing 5 people images and 3 cloth images, which are all unpaired, you should end up with 3 images under cloth dir and 3 images under cloth-mask; and 5 images under each other dirs: agnostic-v3.2, image, image-densepose, image-parse-agnostic-v3.2, image-parse-v3, openpose_img, and openpose_json.

Final test result

在这里插入图片描述

thuongmhh commented 1 year ago

Thanks for the detailed instructions.

Is there any reason you choose CIHP-PGN instead of Self-Correction-Human-Parsing (https://github.com/GoGoDuck912/Self-Correction-Human-Parsing) for human parsing?

Can I use rembg (https://github.com/danielgatis/rembg) instead of carvekit to generate cloth mask?

lujiazho commented 1 year ago

Is there any reason you choose CIHP-PGN instead of Self-Correction-Human-Parsing (https://github.com/GoGoDuck912/Self-Correction-Human-Parsing) for human parsing?

Can I use rembg (https://github.com/danielgatis/rembg) instead of carvekit to generate cloth mask?

Absolutely, all I used is mentioned by the author in Preprocessing.md. If you find a better solution to do it, feel free to use it.

thuongmhh commented 1 year ago

"One thing to mention, the images provided by official dataset keep both visualization (colored) and label (0 - 20). I don't know how they did that. I also tried P mode in PIL, but found nothing."

I think they put color palette before saving the images. (https://pillow.readthedocs.io/en/stable/_modules/PIL/Image.html#Image.putpalette)

lujiazho commented 1 year ago

I think they put color palette before saving the images. (https://pillow.readthedocs.io/en/stable/_modules/PIL/Image.html#Image.putpalette)

Thank you for your explanation! I believe this will help others who want to reproduce this in the future.

swikrutimmaind commented 1 year ago

Thank you for such great explanation. Can you please guide on how to run the whole code ? There are so many files and its' quite challenging to find out where to start from? Any help is much appreciated. Thanks in advance ❤

lujiazho commented 1 year ago

An update on the Openpose Installing section:

You may want to change the last line of code if you run into some problem with cuDNN on the Colab, from

!cd openpose && rm -rf build || true && mkdir build && cd build && cmake .. && make -j`nproc`

to

!cd openpose && rm -rf build || true && mkdir build && cd build && cmake .. -DUSE_CUDNN=OFF && make -j`nproc`

This solution is from CMU-Perceptual-Computing-Lab/openpose/issues/1527

swikrutimmaind commented 1 year ago

Is there any constraint on images for Human Parse? like img resolution or anything.. I am getting human parse for first 4 images (from CIHP_PGN repo test folder), but not for 5th image. Any help is much appreciated!!

stwon1991 commented 1 year ago

Dear lujiazho,

Regarding human parse, the preprocessing.md says "I inferenced a parse map on 256x192 resolution, and upsample it to 1024x768. Then you can see that it has a alias artifact, so I smooth it using "torchgeometry.image.GaussianBlur((15, 15), (3, 3))". I saved a parse map image using PIL.Image with P mode. The color of the parse map image in our dataset(VITON-HD) is just for the visualization, it has 0~19 uint values."

I applied "torchgeometry.image.GaussianBlur((15, 15), (3, 3))" however the np.unique of the image from [ 0 2 5 9 10 13 14 15] to [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15], is there any way to realize the smooth and keep unique classes consistent?

Looking forward to your feedback!

lujiazho commented 1 year ago

Dear lujiazho,

Regarding human parse, the preprocessing.md says "I inferenced a parse map on 256x192 resolution, and upsample it to 1024x768. Then you can see that it has a alias artifact, so I smooth it using "torchgeometry.image.GaussianBlur((15, 15), (3, 3))". I saved a parse map image using PIL.Image with P mode. The color of the parse map image in our dataset(VITON-HD) is just for the visualization, it has 0~19 uint values."

I applied "torchgeometry.image.GaussianBlur((15, 15), (3, 3))" however the np.unique of the image from [ 0 2 5 9 10 13 14 15] to [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15], is there any way to realize the smooth and keep unique classes consistent?

Looking forward to your feedback!

It's been a long time since I played with this. I guess you mess up with the image that contains labels, which makes the labels of [ 0 2 5 9 10 13 14 15] changed.

ljcljc commented 1 year ago

I followed the steps on openpose on google colab, but got this error. I am sure cudnn is already installed, but openpose cannot recognize it. Any advice?

/content /content/openpose Starting OpenPose demo... Configuring OpenPose... Starting thread(s)... F0506 05:58:03.574680 16120 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED Check failure stack trace: @ 0x7f978e08c1c3 google::LogMessage::Fail() @ 0x7f978e09125b google::LogMessage::SendToLog() @ 0x7f978e08bebf google::LogMessage::Flush() @ 0x7f978e08c6ef google::LogMessageFatal::~LogMessageFatal() @ 0x7f978dcb2115 caffe::CuDNNConvolutionLayer<>::LayerSetUp() @ 0x7f978dd96e8d caffe::Net<>::Init() @ 0x7f978dd993a5 caffe::Net<>::Net() @ 0x7f978e73fc6c op::NetCaffe::initializationOnThread() @ 0x7f978e724c23 op::HandExtractorCaffe::netInitializationOnThread() @ 0x7f978e726255 op::HandExtractorNet::initializationOnThread() @ 0x7f978e785887 op::Worker<>::initializationOnThreadNoException() @ 0x7f978e7859d8 op::SubThread<>::initializationOnThread() @ 0x7f978e7871e8 op::Thread<>::initializationOnThread() @ 0x7f978e78a77c op::Thread<>::threadFunction() @ 0x7f978e3cade4 (unknown) @ 0x7f978e0cc609 start_thread @ 0x7f978e206133 clone

MosbehBarhoumi commented 1 year ago

@ljcljc I believe it's not possible to run OpenPose on Google Colab, despite trying various methods. However, I switched to the PyTorch version of OpenPose, which has limited keypoints for the 'pose_keypoints_2d' body part. Fortunately, the results were nearly identical to the agnostic version of the pre-processed data, so there's no need to worry about that.

MosbehBarhoumi commented 1 year ago

Hello @lujiazho , I appreciate the effort you've put in. I would like to inquire about 'image-parse-agnostic-v3.2', Have you attempted to train the HR-VITON model using this approach? Specifically, I am interested in knowing whether the model train without errors and was able to reproduce the results of the authors?

also should I put this image under 'image-parse-agnostic-v3.2' folder, or just like you did in the code : 00057_00

Mingy123 commented 1 year ago

First of all, thanks for elaborating on the preprocessing. I'm having trouble following along with the human parsing step, since I also needed to downscale the input image before passing it through CIHP_PGN due to my low performance system.

After running the gaussian blur, how do you make a clarified image? I tried implementing my own program to "round" pixels to the nearest valid colour, giving me the following image:

converted

However, this doesn't work well with the model. My guess is that the outline of the segments are too random.

Afterwards, I found @thuongmhh 's comment:

I think they put color palette before saving the images.

I used im.convert("P", palette=Image.ADAPTIVE) and conv.putpalette([0,0,0, 0,255,255, 0,0,255, 0,119,221, 0,0,85, 85,51,0, 85,255,170, 0,128,0, 0,85,85, 255,0,0, 255,85,0, 255,170,0, 255,255,0, 170,255,85, 52,86,128]) which gives me this image: my_pal

The colours used are very messed up, but I found that the shape is similar to the one in the dataset. On a side note, using all the colours listen in the CIHP_PGN repo gives me a black image for some reason.

Could anyone please explain how I should handle the upscaling of this image? Thanks in advance.

MosbehBarhoumi commented 1 year ago

Just trying to help you @Mingy123 , I thnik it will be simpler to use this cleaned code for CIHP_PGN, aslo I would like to mention that the output will be colored and non-colored images, all you need is the non-colored images for training (Human parse) and also to get the agnostic cloth/human map. The code will just work fine u don't have to change anything.

Mingy123 commented 1 year ago

@MosbehBarhoumi Thanks for the reply. The code you linked helped speed things up :) Regarding the human parsing, I'm working with the older version VITON-HD which apparently only needs the openpose, cloth mask and the "vis" coloured images. Also, as I mentioned before, I have to downscale the image before running CIHP_PGN or else I would run out of memory. If I were to use this project and use the non coloured image, wouldn't I still be facing the same issue? If I'm not wrong, the non coloured image is on a slight greyscale and the boundaries of differing colours determine the output of this VITON.

24thTinyGiant commented 1 year ago

@lujiazho Can you tell that should I change the train_pairs.txt , images of cloth and person which are fed to model are different So should I make the input cloth image and the person image same ? Please Reply ASAP

lazyseacow commented 1 year ago

Hi @lujiazho , I am a beginner, there are many steps I am not sure how to complete, how can I use the HR-VITON project after I have completed the above steps?

24thTinyGiant commented 1 year ago

@lazyseacow run the train condition python file and then run the train generator file

Gaurab019 commented 1 year ago

Hi @lujiazho I am facing an weird issue with the cloth mask generation. I am getting a completely white image whenever i am using any cloth image for cloth mask generation.

Base-Image:

00004

Output Image:

Untitled

I have been scratching my head for so long time.

I am able to generate a proper output with VITON-HD cloths. But whenever i am using any image to generate a mask it doesnot work.

Any help will be deeply appreciated!!!!!!

Gaurab019 commented 1 year ago

Hi @lujiazho I am facing an weird issue with the cloth mask generation. I am getting a completely white image whenever i am using any cloth image for cloth mask generation.

Base-Image:

Output Image:

I have been scratching my head for so long time.

I am able to generate a proper output with VITON-HD cloths. But whenever i am using any image to generate a mask it doesnot work.

Any help will be deeply appreciated!!!!!!

Never mind i solved it

(img[...,0]==130)&(img[...,1]==130)&(img[...,2]==130)

used 130 instead of 0 at this line

metrosir commented 1 year ago

Hi @lujiazho I am facing an weird issue with the cloth mask generation. I am getting a completely white image whenever i am using any cloth image for cloth mask generation. Base-Image: Output Image: I have been scratching my head for so long time. I am able to generate a proper output with VITON-HD cloths. But whenever i am using any image to generate a mask it doesnot work. Any help will be deeply appreciated!!!!!!

Never mind i solved it

(img[...,0]==130)&(img[...,1]==130)&(img[...,2]==130)

used 130 instead of 0 at this line

rewrite: (img[...,3]==0)

schneiderlin commented 1 year ago

I use the cleaned version of CIHP, the generated cihp_edge_maps are completely white, and cihp_parsing_maps are completely black. Can anyone help?

minh-nnm commented 1 year ago

image-parse-v3 can be got by Graphonomy?

gamingflexer commented 1 year ago

@ljcljc I believe it's not possible to run OpenPose on Google Colab, despite trying various methods. However, I switched to the PyTorch version of OpenPose, which has limited keypoints for the 'pose_keypoints_2d' body part. Fortunately, the results were nearly identical to the agnostic version of the pre-processed data, so there's no need to worry about that.

Hey i have tried this but when i am using code & output from pytorch-openpose for generating agnostic-v3.2 using the above given code. Also the json is not directly accessible in the needed format so i have written a function for it. While this code does not provide keypoints for hands, face_keypoints_2d, etc

renrenzsbbb commented 1 year ago

Thanks for your great Tutorial. I find that the data in dataset is aligned in some method. Can you share your code about how to align it if I want to test image downloaded from web. Thanks!

jin24324 commented 1 year ago

At Parse agnostic I do the same but got this error: ValueError Traceback (most recent call last) in <cell line: 41>() 64 im_parse = Image.open(osp.join(data_path, 'image-parse-v3', parse_name)) 65 ---> 66 agnostic = get_im_parse_agnostic(im_parse, pose_data) 67 68 agnostic.save(osp.join(output_path, parse_name))

in get_im_parse_agnostic(im_parse, pose_data, w, h) 29 mask_arm_draw.ellipse((pointx-radius, pointy-radius, pointx+radius, pointy+radius), 'white', 'white') 30 i_prev = i ---> 31 parse_arm = (np.array(mask_arm) / 255) (label_array == parse_id) 32 agnostic.paste(0, None, Image.fromarray(np.uint8(parse_arm 255), 'L')) 33

ValueError: operands could not be broadcast together with shapes (1024,768) (1024,768,3)

onedotone-wei commented 1 year ago

At Parse agnostic I do the same but got this error: ValueError Traceback (most recent call last) in <cell line: 41>() 64 im_parse = Image.open(osp.join(data_path, 'image-parse-v3', parse_name)) 65 ---> 66 agnostic = get_im_parse_agnostic(im_parse, pose_data) 67 68 agnostic.save(osp.join(output_path, parse_name))

in get_im_parse_agnostic(im_parse, pose_data, w, h) 29 mask_arm_draw.ellipse((pointx-radius, pointy-radius, pointx+radius, pointy+radius), 'white', 'white') 30 i_prev = i ---> 31 parse_arm = (np.array(mask_arm) / 255) (label_array == parse_id) 32 agnostic.paste(0, None, Image.fromarray(np.uint8(parse_arm 255), 'L')) 33

ValueError: operands could not be broadcast together with shapes (1024,768) (1024,768,3)

I also encountered the same problem,have you solved it yet?

onedotone-wei commented 1 year ago

I have completed all the steps, run the 'test_generator.py' this error: parse_agnostic_map = parse_agnosticmap.scatter(0, parse_agnostic, 1.0) RuntimeError: index 151 is out of bounds for dimension 0 with size 20 May I ask which step was wrong?

ntad27 commented 1 year ago

At the Human Parse (step 2) could you show me how to get the result just like in the original dataset (the image below) 14525_00 In the Preprocessing.md the author said "I inferenced a parse map on 256x192 resolution, and upsample it to 1024x768. Then you can see that it has a alias artifact, so I smooth it using "torchgeometry.image.GaussianBlur((15, 15), (3, 3))", I also tried as he said but still didn't get the same result. Could you or anyone with the same problem show me how to fix it? Thank you so much. 01_vis

ntad27 commented 1 year ago

At the Human Parse (step 2) could you show me how to get the result just like in the original dataset (the image below) In the Preprocessing.md the author said "I inferenced a parse map on 256x192 resolution, and upsample it to 1024x768. Then you can see that it has a alias artifact, so I smooth it using "torchgeometry.image.GaussianBlur((15, 15), (3, 3))", I also tried as he said but still didn't get the same result. Could you or anyone with the same problem show me how to fix it? Thank you so much.

To clarify my issue, my input image's shape was (192, 256) then upsampled to (768, 1024) just like the author said. I also input (768, 1024) images but still didn't get a better result at the final stage, the result is below. 01_vis

TalhaUsuf commented 1 year ago

openpose models gives download errors, please find the models here:

https://drive.google.com/drive/folders/1USEdy_7uvwO4PIqsQJq8kT0sX4H4f7nn

you can use gdown to download these models. Donot forget to place them in respective dirs.

Hanjunzhe commented 1 year ago

00035_00 May I ask why the agnostic file I generated has a certain gap at the edge? I did not change the code. Did anyone else encounter the same problem as me?

macguyversmusic commented 11 months ago

every step worked a treat in colab with a few tweaks regarding getting models and checkpoints for various bits. thank you very much for such a great in depth guide, i have successfully generated my models in my clothes with no training and they dont look horrid. i am using ladi-viton with my own now freshly preprocessed dataset thanks again!

Yang8823 commented 9 months ago

For the DensePose using detectron2, when i follow the command using anaconda on windows:

python apply_net.py show configs/densepose_rcnn_R_50_FPN_s1x.yaml \
https://dl.fbaipublicfiles.com/densepose/densepose_rcnn_R_50_FPN_s1x/165712039/model_final_162be9.pkl \
image_path dp_segm -v

I keep getting this error:

apply_net.py: error: unrecognized arguments: image_path ,dp_segm

After I went through the GETTING_STARTED.md of the repo and removed the \ and did, it works perfectly:

python apply_net.py show configs/densepose_rcnn_R_50_FPN_s1x.yaml https://dl.fbaipublicfiles.com/densepose/densepose_rcnn_R_50_FPN_s1x/165712039/model_final_162be9.pkl image_path dp_segm -v

Just a heads up for anyone getting the same problem as me and save you some time :)

curious-ai-developer commented 8 months ago

In openpose part: After running %cd openpose

run openpose.bin

!./build/examples/openpose/openpose.bin --image_dir ../image_path --hand --disable_blending --display 0 --write_json ../json_path --write_images ../img_path --num_gpu 1 --num_gpu_start 0 I got an error: /bin/bash: line 1: ./build/examples/openpose/openpose.bin: No such file or directory How to solve it? Thanks!

Akashram28 commented 8 months ago

In openpose part: After running %cd openpose

run openpose.bin

!./build/examples/openpose/openpose.bin --image_dir ../image_path --hand --disable_blending --display 0 --write_json ../json_path --write_images ../img_path --num_gpu 1 --num_gpu_start 0 I got an error: /bin/bash: line 1: ./build/examples/openpose/openpose.bin: No such file or directory How to solve it? Thanks!

I had the same trouble, you can work around this by using mediapipe for landmark detection and then converting the results to the openpose json format. This answer should help you with that

valvarl commented 7 months ago

I did a reserch on how you can do a quick upsample of an image. This may be a useful paper here: McGuire, Morgan, and Mara Gagiu. "MMPX style-preserving pixel art magnification." Journal of Graphics Techniques (January 2021) 36 (2021).

The idea comes from console emulators that need to keep the original style while enhancing the image quality. Such methods also include Nearest, EPX, and XBR. All of them, like MMPX, do not add new colors when upsampling, but they also do not refer pixels on boundaries to other classes. I chose MMPX since it's one of the recent articles in this thread.

You need to clone the repository (https://github.com/ITotalJustice/mmpx) and build it using cmake. It is important to add SHARED to CMakeLists.txt to get a ".so" file, not ".a": add_library(mmpx SHARED mmpx.c).

cd mmpx; mkdir build; cd build; cmake ..; make;

The following code calls MMPX from the shared object file:

import cv2
import numpy as np
import ctypes as ct

# Load the MMPX library
mmpx_lib = ct.cdll.LoadLibrary("mmpx/build/libmmpx.so")
mmpx_scale2x = mmpx_lib.mmpx_scale2x
mmpx_scale2x.argtypes = [ct.POINTER(ct.c_uint32), ct.POINTER(ct.c_uint32), ct.c_uint32, ct.c_uint32]
mmpx_scale2x.restype = None

def upscale_image(input_image_path, output_image_path):
    # Load the image using OpenCV
    image = cv2.imread(input_image_path)

    srcHeight, srcWidth, _ = image.shape
    dstHeight, dstWidth = 2 * srcHeight, 2 * srcWidth

    # Convert image to buffer
    srcBuffer = np.zeros(srcHeight * srcWidth, dtype=np.uint32)
    for y in range(srcHeight):
        for x in range(srcWidth):
            b, g, r = image[y, x]
            srcBuffer[y * srcWidth + x] = (r << 16) | (g << 8) | b

    # Create buffer for the result
    dstBuffer = np.zeros(dstHeight * dstWidth, dtype=np.uint32)

    # Call the mmpx_scale2x function
    mmpx_scale2x(
        (ct.c_uint32 * len(srcBuffer)).from_buffer_copy(srcBuffer),
        (ct.c_uint32 * len(dstBuffer)).from_buffer(dstBuffer),
        ct.c_uint32(srcWidth),
        ct.c_uint32(srcHeight)
    )

    # Convert buffer to image
    result_image = np.frombuffer(dstBuffer, dtype=np.uint32).reshape((dstHeight, dstWidth))

    # Extract color channels
    red_channel = (result_image >> 16) & 255
    green_channel = (result_image >> 8) & 255
    blue_channel = result_image & 255

    # Combine channels into RGB image
    result_image_bgr = np.stack((blue_channel, green_channel, red_channel), axis=-1)

    # Save the result
    cv2.imwrite(output_image_path, result_image_bgr.astype(np.uint8))

# Call the function with the image path
upscale_image('CIHP_PGN/output/cihp_parsing_maps/00013_00_vis.png', '00013_00_x2.png')
upscale_image('00013_00_x2.png', '00013_00_x4.png')

Result: (before) 00013_00_vis (after) 00013_00_x4

biancaszekely32 commented 6 months ago

Hi! Thank you so much for this tutorial, I am really grateful! I know it had been a while since you wrote this, but maybe you could offer some guidance please. In the last image you provided, from where did you got those 2 results I circled ? Screenshot 2024-05-16 100419 I tried to process my own image but I am unable to get accurate resulta. these are my results: I don't know how to continue from here. Can you please help me ? @lujiazho

MuhammadHashir28 commented 6 months ago

hy i facing issue with the output image mask parse_agnostic for human and other this is the ouput getting please help what am i doing wrong it segmentation image is like this

this is gray parse 20240528_182000__semantic_parse_gray this is human parse

MuhammadHashir28 commented 5 months ago

@lujiazho - I tried the above steps for this image 20240602_182311__cloth ![Uploading image.png…]() I am getting this result it is bluring out the t shit logo please help me

AditiF16 commented 5 months ago

# go to openpose directory
%cd openpose
# run openpose.bin
!./build/examples/openpose/openpose.bin --image_dir ../image_path --hand --disable_blending --display 0 --write_json ../json_path --write_images ../img_path --num_gpu 1 --num_gpu_start 0

Getting error in this code 'no such directory exists'. PLEASE provide solution.

harinandyala4 commented 5 months ago

It is possible to use my own garments and models instead of dataset that present in the github

scorching12 commented 3 months ago

你好！非常感谢您提供本教程，我真的很感激！我知道你写这篇文章已经有一段时间了，但也许你可以提供一些指导。在您提供的最后一张图片中，您从哪里得到我圈出的那 2 个结果？我试图处理自己的图像，但我无法获得准确的结果。这些是我的结果：我不知道如何从这里继续。你能帮帮我吗？

Hello, I have the same question. How do I operate these two pictures

877259454 commented 1 month ago

hy i facing issue with the output image mask parse_agnostic for human and other this is the ouput getting please help what am i doing wrong it segmentation image is like this

this is gray parse this is human parse

@MuhammadHashir28 Have you figured out the problem that caused this? I am having a similar issue where the black region painted for parse agnostic is accurate but the grey region painted for human agnostic is way off.

sangyun884 / HR-VITON

【Not An Issue But Tutorial】A complete tutorial of pre-processing steps #45

Pre

1、OpenPose（On colab, need GPU）

2、Human Parse

Method 1: Colab

Method 2: Local or Server

3、DensePose (On colab, GPU or CPU)

4、Cloth Mask (On colab, GPU or CPU)

5、Parse Agnostic (On colab)

6、Human Agnostic

7、Conclusion

run openpose.bin

run openpose.bin