Closed lrzjason closed 9 months ago
Hey! I suggest for you to either filter the huffingface dataset and then follow the instructions to train the model - just use pickscore instead of clip-h. Alternatively, convert your dataset to the huggingface dataset format. You can see the instructions to download the data and train the model for more details.
I have manually filtered 4k caption image pairs. I have captions, images/high and images/low folders. I used above structure for a sdxl lora training using my modified slider repo https://github.com/lrzjason/sliders-image
After filtering so many images in Pickapicv2 dataset, I found many images are not fulfil my preference in the dataset. I want to finetune the pickscore model with my filtered subset. How should I prepare the dataset to run the trainer?
My filtered subset of pickapicv2(around 6GB with 4k image pairs with captions) https://mega.nz/file/fgsxhbIa#QSNcjVxm4vY2f68PyOzmlIMHQCQOe93EyyFK1rmRkEc
Thanks a lot if you could give some advice.