shallowdream204 / DreamClear

[NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Apache License 2.0
16 stars 1 forks source link
diffusion-transformer pixelart restoration super-resolution

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Yuang Ai1,2Xiaoqiang Zhou1,4Huaibo Huang1,2Xiaotian Han3Zhengyu Chen3Quanzeng You3Hongxia Yang3
1MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences 
2School of Artificial Intelligence, University of Chinese Academy of Sciences 
3ByteDance, Inc 4University of Science and Technology of China 
NeurIPS 2024

⭐ If DreamClear is helpful to your projects, please help star this repo. Thanks! 🤗
## 🔥 News - **More convenient inference code&demo will be released in the coming days. Please stay tuned for updates, thanks!** - **2024.10.25**: Release segmentation&detection code, pre-trained models. - **2024.10.25**: Release `RealLQ250` benchmark, which contains 250 real-world LQ images. - **2024.10.25**: Release training&inference (256->1024) code, pre-trained models of DreamClear. - **2024.10.24**: This repo is created. ## 📸 Real-World IR Results [](https://imgsli.com/MzExNTEx) [](https://imgsli.com/MzEwNTEx) [](https://imgsli.com/MzEwNDk2) [](https://imgsli.com/MzEwNTA4) [](https://imgsli.com/MzEwNTEz) [](https://imgsli.com/MzEwNTMw) ## 🔧 Dependencies and Installation 1. Clone this repo and navigate to DreamClear folder ```bash git clone https://github.com/shallowdream204/DreamClear.git cd DreamClear ``` 2. Create Conda Environment and Install Package ```bash conda create -n dreamclear python=3.9 -y conda activate dreamclear pip3 install -r requirements.txt ``` 3. Download Pre-trained Models (All models can be downloaded at [Huggingface](https://huggingface.co/shallowdream204/DreamClear/tree/main) for convenience.) #### Base Model: * `PixArt-α-1024`: [PixArt-XL-2-1024-MS.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-XL-2-1024-MS.pth) * `VAE`: [sd-vae-ft-ema](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/sd-vae-ft-ema) * `T5 Text Encoder`: [t5-v1_1-xxl](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl) * `SwinIR`: [general_swinir_v1.ckpt](https://huggingface.co/lxq007/DiffBIR/blob/main/general_swinir_v1.ckpt) #### Ours provided Model: * `DreamClear`: [DreamClear-1024.pth](https://huggingface.co/shallowdream204/DreamClear/blob/main/DreamClear-1024.pth) * `RMT for Segmentation`: [rmt_uper_s_2x.pth](https://huggingface.co/shallowdream204/DreamClear/blob/main/rmt_uper_s_2x.pth) * `RMT for Detection`: [rmt_maskrcnn_s_1x.pth](https://huggingface.co/shallowdream204/DreamClear/blob/main/rmt_maskrcnn_s_1x.pth) ## 🎰 Train #### I - Prepare training data Similar to [SeeSR](https://github.com/cswry/SeeSR/blob/main/README.md#step2-prepare-training-data), We pre-prepare HQ-LQ image pairs for the training of IR model. Run the following command to make paired data for training: ```shell python3 tools/make_paired_data.py \ --gt_path gt_path1 gt_path2 ... \ --save_dir /path/to/save/folder/ \ --epoch 1 # number of epochs to generate paired data ``` After generating paired data, you can use MLLM (e.g., [LLaVA](https://github.com/haotian-liu/LLaVA)) to generate detailed text prompt for HQ images. Then you need to use T5 to extract text features in order to save training time. Run: ```shell python3 tools/extract_t5_features.py \ --t5_ckpt /path/to/t5-v1_1-xxl \ --caption_folder /path/to/caption/folder \ --save_npz_folder /path/to/save/npz/folder ``` Finally, the directory structure for training datasets should look like ``` training_datasets_folder/ └── gt └── 0000001.png # GT , (1024, 1024, 3) └── ... └── sr_bicubic └── 0000001.png # LQ + bicubic upsample, (1024, 1024, 3) └── ... └── caption └── 0000001.txt # Caption files (not used in training) └── ... └── npz └── 0000001.npz # T5 features └── ... ``` #### II - Training for DreamClear Run the following command to train DreamClear with default settings: ```shell python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... \ train_dreamclear.py configs/DreamClear/DreamClear_Train.py \ --load_from /path/to/PixArt-XL-2-1024-MS.pth \ --vae_pretrained /path/to/sd-vae-ft-ema \ --swinir_pretrained /path/to/general_swinir_v1.ckpt \ --val_image /path/to/RealLQ250/lq/val_image.png \ --val_npz /path/to/RealLQ250/npz/val_image.npz \ --work_dir experiments/train_dreamclear ``` Please modify the path of training datasets in `configs/DreamClear/DreamClear_Train.py`. You can also modify the training hyper-parameters (e.g., `lr`, `train_batch_size`, `gradient_accumulation_steps`) in this file, according to your own GPU machines. ## ⚡ Inference We provide the `RealLQ250` benchmark, which can be downloaded from [Google Drive](https://drive.google.com/file/d/16uWuJOyGMw5fbXHGcl6GOmxYJb_Szrqe/view?usp=sharing). #### Testing DreamClear for Image Restoration Run the following command to restore LQ images from 256 to 1024: ```shell python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 \ test_1024.py configs/DreamClear/DreamClear_Test.py \ --dreamclear_ckpt /path/to/DreamClear-1024.pth \ --swinir_ckpt /path/to/general_swinir_v1.ckpt \ --vae_ckpt /path/to/sd-vae-ft-ema \ --lre --cfg_scale 4.5 --color_align wavelet \ --image_path /path/to/RealLQ250/lq \ --npz_path /path/to/RealLQ250/npz \ --save_dir validation ``` #### Evaluation on high-level benchmarks Testing instructions for [segmentation](segmentation/README.md) and [detection](detection/README.md) can be found in their respective folders. ## 🪪 License The provided code and pre-trained weights are licensed under the [Apache 2.0 license](LICENSE). ## 🤗 Acknowledgement This code is based on [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha), [BasicSR](https://github.com/XPixelGroup/BasicSR) and [RMT](https://github.com/qhfan/RMT). Some code are brought from [SeeSR](https://github.com/cswry/SeeSR), [StableSR](https://github.com/IceClear/StableSR), [DiffBIR](https://github.com/XPixelGroup/DiffBIR) and [LLaVA](https://github.com/haotian-liu/LLaVA). We thank the authors for their awesome work. ## 📧 Contact If you have any questions, please feel free to reach me out at shallowdream555@gmail.com. ## 📖 Citation If you find our work useful for your research, please consider citing our paper: ``` @article{ai2024dreamclear, title={DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation}, author={Ai, Yuang and Zhou, Xiaoqiang and Huang, Huaibo and Han, Xiaotian and Chen, Zhengyu and You, Quanzeng and Yang, Hongxia}, journal={Advances in Neural Information Processing Systems}, year={2024} } ```