yuvalkirstain / PickScore

MIT License
446 stars 26 forks source link

Finetune pickscore model using local data (4k filtered image pairs) #14

Closed lrzjason closed 9 months ago

lrzjason commented 10 months ago

I have manually filtered 4k caption image pairs. I have captions, images/high and images/low folders. I used above structure for a sdxl lora training using my modified slider repo https://github.com/lrzjason/sliders-image

After filtering so many images in Pickapicv2 dataset, I found many images are not fulfil my preference in the dataset. I want to finetune the pickscore model with my filtered subset. How should I prepare the dataset to run the trainer?

My filtered subset of pickapicv2(around 6GB with 4k image pairs with captions) https://mega.nz/file/fgsxhbIa#QSNcjVxm4vY2f68PyOzmlIMHQCQOe93EyyFK1rmRkEc

Thanks a lot if you could give some advice.

yuvalkirstain commented 10 months ago

Hey! I suggest for you to either filter the huffingface dataset and then follow the instructions to train the model - just use pickscore instead of clip-h. Alternatively, convert your dataset to the huggingface dataset format. You can see the instructions to download the data and train the model for more details.