KeyError: 'content_image_file'

open-mmlab / StyleShot

StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型，无需针对图片微调，即能生成高质量的个性风格化图片!

https://styleshot.github.io/

MIT License

267 stars 16 forks source link

KeyError: 'content_image_file' #25

Open ajie6666 opened 2 months ago

ajie6666 commented 2 months ago

When I was training in the second stage, I got this error[KeyError: 'content_image_file'], but when I built the dataset from the DATASET.md, I saw that it only needed to be formatted {"image_file": "", "content_prompt": "", ...}.I would like to ask what this ”content_image_file “should consist of ？ Then I would like to ask how the results of the stage-1 of training can be applied to the second stage, and how the generated .bin files can be used？ Thanks.

Jeoyal commented 2 months ago

Hi @ajie6666 , thank you for your interest in our work. During the second stage of training, we first process the raw image into a content image using the Process content input described in DATASET.md. Then, we add the content image path to the "content_image_file" field in json file.

In addition, after training stage 1, your checkpoint path structure might look like this: output/checkpoint-xxxxx/pytorch_model.bin (ip_ckpt) output/checkpoint-xxxxx/pytorch_model_1.bin (style_aware_encoder_path) Simply load pytorch_model.bin as--pretrained_ip_adapter_path and pytorch_model_1.bin as --pretrained_style_encoder_path to train stage 2.

ajie6666 commented 2 months ago

I really appreciate your detailed answers to my questions. Probably because of the version of the transformers, I generated the file in “.safetensors ”format instead of the “.bin”. This made me unable to find them at first, then I tried to convert .safetensors to .bin , or let the first-stage generate .bin files, but none of them could be read out in the second-stage. Finally, according to https://huggingface.co/docs/safetensors/speed ,I changed "style_aware_encoder.load_state_dict( torch.load(args.pretrained_style_encoder_path))" to "style_aware_encoder.load_state_dict(load_file(args.pretrained_style_encoder_path),strict=False)", and replace "sd = torch.load( ckpt_path, map_location="cpu")" to "sd = load_file(ckpt_path, device="cpu")", It is now working normally.

Jeoyal commented 2 months ago

I'm glad to hear that this issue has been resolved, and I hope you achieve the results you desire :).

ajie6666 commented 2 months ago

I would like to correct a previous misconception of mine. Simply using load_file with strict=False does not resolve the issue; an error still occurs when the file is eventually read. Instead, one should employ accelerator.save_state(save_path, safe_serialization=False) to generate a .bin file.

Additionally, I have a query regarding the integration of my own style of images into Styleshot, but the results have been unsatisfactory. In the first-stage, I trained the model with images in our style and their corresponding JSONL files. For the second-stage, I utilized the images from the "Stylebench.content40" file you provided, processed them through "Process content input," and incorporated them both into a JSONL file. I am contemplating whether the poor outcomes are due to the limited size of the dataset or if there might be an issue with the data I inputted. I would appreciate your advice on this matter.

Jeoyal commented 2 months ago

Hi, for stage 2 you should process your dataset from stage 1 rather than Stylebench.content40. For "style1.png" in your dataset, you should process it into content image and incorporated it into "content_image_file" field in json file.

Jeoyal commented 2 months ago

In addition, what's your dataset scale?

ajie6666 commented 2 months ago

Thanks for your reply.I understand your point now. I'll run it again. My dataset consists of 3,700 images, each with a size of 5000 x 2333.