Open ajie6666 opened 2 months ago
Hi @ajie6666 , thank you for your interest in our work. During the second stage of training, we first process the raw image into a content image using the Process content input described in DATASET.md. Then, we add the content image path to the "content_image_file" field in json file.
In addition, after training stage 1, your checkpoint path structure might look like this:
output/checkpoint-xxxxx/pytorch_model.bin (ip_ckpt)
output/checkpoint-xxxxx/pytorch_model_1.bin (style_aware_encoder_path)
Simply load pytorch_model.bin as--pretrained_ip_adapter_path
and pytorch_model_1.bin as --pretrained_style_encoder_path
to train stage 2.
I really appreciate your detailed answers to my questions. Probably because of the version of the transformers, I generated the file in “.safetensors ”format instead of the “.bin”. This made me unable to find them at first, then I tried to convert .safetensors to .bin , or let the first-stage generate .bin files, but none of them could be read out in the second-stage. Finally, according to https://huggingface.co/docs/safetensors/speed ,I changed "style_aware_encoder.load_state_dict( torch.load(args.pretrained_style_encoder_path))" to "style_aware_encoder.load_state_dict(load_file(args.pretrained_style_encoder_path),strict=False)", and replace "sd = torch.load( ckpt_path, map_location="cpu")" to "sd = load_file(ckpt_path, device="cpu")", It is now working normally.
I'm glad to hear that this issue has been resolved, and I hope you achieve the results you desire :).
I would like to correct a previous misconception of mine. Simply using load_file with strict=False does not resolve the issue; an error still occurs when the file is eventually read. Instead, one should employ accelerator.save_state(save_path, safe_serialization=False) to generate a .bin file.
Additionally, I have a query regarding the integration of my own style of images into Styleshot, but the results have been unsatisfactory. In the first-stage, I trained the model with images in our style and their corresponding JSONL files. For the second-stage, I utilized the images from the "Stylebench.content40" file you provided, processed them through "Process content input," and incorporated them both into a JSONL file. I am contemplating whether the poor outcomes are due to the limited size of the dataset or if there might be an issue with the data I inputted. I would appreciate your advice on this matter.
Hi, for stage 2 you should process your dataset from stage 1 rather than Stylebench.content40
.
For "style1.png" in your dataset, you should process it into content image and incorporated it into "content_image_file" field in json file.
In addition, what's your dataset scale?
Thanks for your reply.I understand your point now. I'll run it again. My dataset consists of 3,700 images, each with a size of 5000 x 2333.
When I was training in the second stage, I got this error[KeyError: 'content_image_file'], but when I built the dataset from the DATASET.md, I saw that it only needed to be formatted {"image_file": "", "content_prompt": "", ...}.I would like to ask what this ”content_image_file “should consist of ? Then I would like to ask how the results of the stage-1 of training can be applied to the second stage, and how the generated .bin files can be used? Thanks.