xiezhy6 / GP-VTON

Official Implementation for CVPR2023 paper "GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning"
475 stars 82 forks source link

GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning

Zhenyu Xie1, Zaiyu Huang1, Xin Dong2, Fuwei Zhao1, Haoye Dong3,
Xijin Zhang2 Feida Zhu2 Xiaodan Liang1,4
1Shenzhen Campus of Sun Yat-Sen University  2ByteDance
3Carnegie Mellon University  4Peng Cheng Laboratory
[Paper](https://arxiv.org/pdf/2303.13756.pdf) | [Project Page](https://github.com/xiezhy6/GP-VTON)
GP-VTON aims to transfer an in-shop garment onto a specific person.

Fine-grained Parsing

We provide the fine-grained parsing result of the model images and in-shop garment images from two existing high-resolution (1024 x 768) virtual try-on benchmarks, namely, VITON-HD and DressCode.

We provide two version of the parsing results. One is with the original resolution (1024 x 768). Another is with the resolution of 512 x 384, on which our experiment are conducted.

Resolution Google Cloud Baidu Yun
VITON-HD(512 x 384) Available soon Download
VITON-HD(1024 x 768) Available soon Download
DressCode(512 x 384) Available soon Download
DressCode(1024 x 768) Available soon Download

The parsing labels for model image (person) and garment image (in-shop garment) are a bit different. The semantic of each index for human/garment parsing are described below.

Human Parsing (for person) Index of label Semantic of label
0 background
1 hat
2 hair
3 glove
4 glasses
5 upper (only torso region)
6 dresses (only torso region)
7 coat (only torso region)
8 socks
9 left pants
10 right patns
11 skin (around neck region)
12 scarf
13 skirts
14 face
15 left arm
16 right arm
17 left leg
18 right leg
19 left shoe
20 right shoe
21 left sleeve (for upper)
22 right sleeve (for upper)
23 bag
24 left sleeve (for dresses)
25 right sleeve (for dresses)
26 left sleeve (for coat)
27 right sleeve (for coat)
28 belt
Garment Parsing (for in-shop garment) Index of label Semantic of label
0 background
5 upper (only torso region)
6 dresses (only torso region)
7 coat (only torso region)
9 left pants
10 right patns
13 skirts
21 left sleeve (for upper, dresses, coat)
22 right sleeve (for upper, dresses, coat)
24 outer collar (preserved during training)
25 inner collar (eliminated during training)

Environment Setup

Install required packages:

pip3 install -r requirements.txt

Dataset

We conduct experiments on the publicly available VITON-HD and DressCode datasets with resolution of 512 x 384. For convenience, we provide all of the conditions used in our experiments in the following links.

We also provide another version with the original resolution (1024 x 768).

Resolution Google Cloud Baidu Yun
VITON-HD(512 x 384) Available soon Download
VITON-HD(1024 x 768) Available soon Download
DressCode(512 x 384) Available soon Download
DressCode(1024 x 768) Available soon Download

When using the VITON-HD and DressCode dataset, please strictly obey the official licences in their websites. (Licence of VITON-HD, Licence of DressCode)

Inference

VITON-HD

Please download the pre-trained model from [Google Link (Available soon)]() or Baidu Yun Link, and rename the downloaded directory to checkpoints and put it under root directory of this project.

To test the first stage (i.e., the LFGP warping module), run the following command:

python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=4739 test_warping.py \
    --name test_partflow_vitonhd_unpaired_1109 \
    --PBAFN_warp_checkpoint 'checkpoints/gp-vton_partflow_vitonhd_usepreservemask_lrarms_1027/PBAFN_warp_epoch_121.pth' \
    --resize_or_crop None --verbose --tf_log \
    --dataset vitonhd --resolution 512 \
    --batchSize 2 --num_gpus 8 --label_nc 14 --launcher pytorch \
    --dataroot /home/tiger/datazy/Datasets/VITON-HD-512 \
    --image_pairs_txt test_pairs_unpaired_1018.txt

To test the second stage (i.e., the try-on generator), run the following command:

python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=4736 test_tryon.py \
    --name test_gpvtongen_vitonhd_unpaired_1109 \
    --resize_or_crop None --verbose --tf_log \
    --dataset vitonhd --resolution 512 \
    --batchSize 12 --num_gpus 8 --label_nc 14 --launcher pytorch \
    --PBAFN_gen_checkpoint 'checkpoints/gp-vton_gen_vitonhd_wskin_wgan_lrarms_1029/PBAFN_gen_epoch_201.pth' \
    --dataroot /home/tiger/datazy/Datasets/VITON-HD-512 \
    --image_pairs_txt test_pairs_unpaired_1018.txt \
    --warproot sample/test_partflow_vitonhd_unpaired_1109

Note that, in the above two commands, parameter --dataroot refers to the root directory of VITON-HD dataset, parameter --image_pairs_txt refers to the test list, which is put under the root directory of VITON-HD dataset, parameter --warproot in the second command refers to the directory of the warped results generated by the first command. Both of the generated results from the two commands are saved under the directory ./sample/exp_name, in which exp_name is defined by the parameter --name in each command.

Or you can run the bash scripts by using the following commonds:

# for warping module
bash scripts/test.sh 1

# for try-on module
bash scripts/test.sh 2

We also provide the pre-trained model for higher resolution (1024 x 768) synthesis. Run the following commonds for inference:

# for warping module
bash scripts/test.sh 3

# for try-on module
bash scripts/test.sh 4

Note that, for higher resolution model, we only re-train the try-on module, thus the warping module is the same as that for 512-resolution synthesis.

DressCode

To test GP-VTON for DressCode dataset, please download the pre-trained model from [Google Link (Available soon)]() or Baidu Yun Link, and rename the downloaded directory to checkpoints and put it under root directory of this project.

The inference scripts are similar those for VITON-HD dataset. You can directly run the following commands:

## for DressCode 512
### for warping module
bash scripts/test.sh 5

### for try-on module
bash scripts/test.sh 6

## for DressCode 1024
### for warping module
bash scripts/test.sh 7

### for try-on module
bash scripts/test.sh 8

Training

VITON-HD

To train the first stage (i.e., the LFGP warping module), run the following command:

python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=7129 train_warping.py \
    --name gp-vton_partflow_vitonhd_usepreservemask_lrarms_1027 \
    --resize_or_crop None --verbose --tf_log \
    --dataset vitonhd --resolution 512 \
    --batchSize 2 --num_gpus 8 --label_nc 14 --launcher pytorch  \
    --dataroot /home/tiger/datazy/Datasets/VITON-HD-512 \
    --image_pairs_txt train_pairs_1018.txt \
    --display_freq 320 --print_freq 160 --save_epoch_freq 10 --write_loss_frep 320 \
    --niter_decay 50 --niter 70  --mask_epoch 70 \
    --lr 0.00005

To train the second stage (i.e., the try-on generator), we first need to run the following command to generate the warped results for the in-shop garments in the training set:

python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=4739 test_warping.py \
    --name test_gpvton_lrarms_for_training_1029 \
    --PBAFN_warp_checkpoint 'checkpoints/gp-vton_partflow_vitonhd_usepreservemask_lrarms_1027/PBAFN_warp_epoch_121.pth' \
    --resize_or_crop None --verbose --tf_log \
    --dataset vitonhd --resolution 512 \
    --batchSize 2 --num_gpus 8 --label_nc 14 --launcher pytorch \
    --dataroot /home/tiger/datazy/Datasets/VITON-HD-512 \
    --image_pairs_txt train_pairs_1018.txt

The warped results will saved in the directory sample/test_gpvton_lrarms_for_training_1029. Then run the following command for training:

python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=4736 train_tryon.py \
    --name gp-vton_gen_vitonhd_wskin_wgan_lrarms_1029 \
    --resize_or_crop None --verbose --tf_log \
    --dataset vitonhd --resolution 512 \
    --batchSize 10 --num_gpus 8 --label_nc 14 --launcher pytorch \
    --dataroot /home/tiger/datazy/Datasets/VITON-HD-512 \
    --image_pairs_txt train_pairs_1018.txt \
    --warproot sample/test_gpvton_lrarms_for_training_1029 \
    --display_freq 50 --print_freq 25 --save_epoch_freq 10 --write_loss_frep 25 \
    --niter_decay 0 --niter 200 \
    --lr 0.0005

Or you can run the bash scripts by using the following commonds:

# train the warping module
bash scripts/train.sh 1

# prepare the warped garment
bash scripts/train.sh 2

# train the try-on module
bash scripts/train.sh 3

To train the try-on module for higher resolution (1024 x 768) synthesis, please run the following commands:

# prepare the warped garment
bash scripts/train.sh 4

# train the try-on module
bash scripts/train.sh 5

DressCode

The training scripts are similar those for VITON-HD dataset. You can directly run the following commands:

## for DressCode 512
### train the warping module
bash scripts/train.sh 6

### prepare the warped garment
bash scripts/train.sh 7

### train the try-on module
bash scripts/train.sh 8

## for DressCode 1024
### train the warping module
bash scripts/train.sh 9

### prepare the warped garment
bash scripts/train.sh 10

Todo

Citation

If you find our code or paper helps, please consider citing:

@inproceedings{xie2023gpvton,
  title     = {GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning},
  author    = {Zhenyu, Xie and Zaiyu, Huang and Xin, Dong and Fuwei, Zhao and Haoye, Dong and Xijin, Zhang and Feida, Zhu and Xiaodan, Liang},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2023},
}

Acknowledgments

Thanks to PF-AFN, our code is based on it.

License

The use of this code is RESTRICTED to non-commercial research and educational purposes.