icaros-usc / pyribs

A bare-bones Python library for quality diversity optimization.
https://pyribs.org
MIT License
205 stars 31 forks source link

Add QDHF tutorial #459

Closed ld-ing closed 3 months ago

ld-ing commented 4 months ago

Description

Add a tutorial that uses Quality Diversity through Human Feedback (QDHF) to improve the diversity of Stable Diffusion image generations. The tutorial showcases how people can do QD optimization without manually crafted diversity metrics. It also extends the previous DQD tutorial with an LSI pipeline using Stable Diffusion.

TODO

Status

ld-ing commented 3 months ago

Hi @ld-ing, thank you again for writing this tutorial! I revised the notebook a bit; here are the changes I made:

  • Updated dependencies -- ribs[visualize] incorporates shapely and matplotlib
  • Replaced alive_progress / alive_bar with tqdm
  • Switched tutorial links to link to the latest versions rather than stable
  • Minor grammar fixes
  • Switched from optimizer to scheduler
  • Added comments in the fit_dis_embed function for clarity
  • Added comments in the training loop for clarity

Would you also mind making the following changes?

  1. In the section on "Train Diversity Metrics through Contrastive Learning", could you add a brief explanation of how the contrastive learning part of QDHF works? In particular, right now it is a bit difficult to understand all the code related to the DisEmbed, but I think it would all make sense with an explanation of how the DisEmbed is being used. No need to go too far into details; you can always refer readers to the paper for that.
  2. By the way, what does the Dis mean in DisEmbed?
  3. Regarding the fit_dis_embed function, could you add a comment explaining how the loss function works in the training loop? It seems to be a bit different from Eq. 3 in the paper.
  4. Also, I'm unclear how gt_measures works -- does the DreamSim model output some features, which are the "gt_measures", which you then use to figure out the preference? (i.e., the 2AFC mentioned in 3.2). If so, I think it would be good to explain this in fit_dis_embed
  5. Could you mention somewhere that CLIP is used to embed the image into 512d, and QDHF learns to embed 512d into 2d?
  6. Should CLIP be using ViT-B/16 or ViT-B/32? The QDHF paper mentions ViT-B/16 but this tutorial uses ViT-B/32.

Overall, I think it looks great! My comments all center on fit_dis_embed because this is one of the key parts of the tutorial, so I am hoping to make the code as clear as possible to readers. Once you make these changes, I'll run the notebook on Colab again to get a "golden" version for the tutorials.

Hi @btjanaka, thanks for the edits and detailed comments! I have revised the code to resolve these issues. Here are the changes:

  1. I revised the contrastive learning part and added more explanations and comments.
  2. DisEmbed was referred to as distance embedding, but I have changed all occurrences to div_proj, i.e., diversity projection, to keep terms consistent with the paper.
  3. The loss function is indeed equivalent to Eq.3 in the paper, and I've added more explanations.
  4. I have added more descriptions on how DreamSim is used to source feedback.
  5. I added a description of the model saying it's projecting 512-d to 2-d.
  6. The most recent paper (not on arXiv yet) is using ViT-B/32. However, users may use other checkpoints depending on their needs and resources.

Besides, I also changed the organization a little bit to make things clearer.

Again, thanks for these insightful suggestions, and let me know if you have other comments!