After some experimentation it is clear that big sleep will not produce a good dataset of images from a list of scraped prompts out of the box. This puts us in a position where we have to come up with solutions to extract better images from big sleep:
As mentioned here, one possible solution is to source more URLs using better, more general, scrapers, and feed this larger set of prompts into big sleep. With this setup we can be liberal with discarding prompts that don't produce good images.
The other option is to have adaptive training. The idea here would be to train the model for an adaptive number of epochs based on how the loss behaves. If the model is having a hard time generating an image for a particular prompt, then train it longer until it gets better. Since training for a large number of epochs on each image would not be feasible, we would have to do this in an adaptive manner. One concern here is that, we don't actually know whether the images will get better with more training (in general). The starting sample might just be a lost cause (bad local minima), and more training would get us further stuck in it.
This brings me to the third idea. Adaptive resampling. Big sleep already tries to do something of this sort (it is labelled as an experimental feature), but it is not adaptive, and hence would not scale. The idea here would be to resample the generated output to get a good starting loss; this improves our confidence that we are not stuck in a local minima.
All these ideas are not "quick fixes", so it will take some thinking a priori to see that our efforts are not directed toward a dead-end. Thoughts on how we move forward?
After some experimentation it is clear that big sleep will not produce a good dataset of images from a list of scraped prompts out of the box. This puts us in a position where we have to come up with solutions to extract better images from big sleep:
All these ideas are not "quick fixes", so it will take some thinking a priori to see that our efforts are not directed toward a dead-end. Thoughts on how we move forward?