Add QDHF tutorial - Githubissues

Description

Add a tutorial that uses Quality Diversity through Human Feedback (QDHF) to improve the diversity of Stable Diffusion image generations. The tutorial showcases how people can do QD optimization without manually crafted diversity metrics. It also extends the previous DQD tutorial with an LSI pipeline using Stable Diffusion.

TODO

[x] Polish the content.
[x] Make the tutorial show up on the website. -> Pending PR on pyribs.org repo
[x] Test if it runs on Colab and local machines. -> Ran on Colab
[x] Make tests pass for this tutorial. -> Decided to just skip this test and the DQD test as they take too long.

Status

[x] I have read the guidelines in CONTRIBUTING.md
[x] I have formatted my code using yapf
[x] I have tested my code by running pytest
[x] I have linted my code with pylint
[x] I have added a one-line description of my change to the changelog in HISTORY.md
[x] This PR is ready to go

Hi @ld-ing, thank you again for writing this tutorial! I revised the notebook a bit; here are the changes I made:

Updated dependencies -- ribs[visualize] incorporates shapely and matplotlib

Replaced alive_progress / alive_bar with tqdm

Switched tutorial links to link to the latest versions rather than stable

Minor grammar fixes

Switched from optimizer to scheduler

Added comments in the fit_dis_embed function for clarity

Added comments in the training loop for clarity

Would you also mind making the following changes?

In the section on "Train Diversity Metrics through Contrastive Learning", could you add a brief explanation of how the contrastive learning part of QDHF works? In particular, right now it is a bit difficult to understand all the code related to the DisEmbed, but I think it would all make sense with an explanation of how the DisEmbed is being used. No need to go too far into details; you can always refer readers to the paper for that.

By the way, what does the Dis mean in DisEmbed?

Regarding the fit_dis_embed function, could you add a comment explaining how the loss function works in the training loop? It seems to be a bit different from Eq. 3 in the paper.

Also, I'm unclear how gt_measures works -- does the DreamSim model output some features, which are the "gt_measures", which you then use to figure out the preference? (i.e., the 2AFC mentioned in 3.2). If so, I think it would be good to explain this in fit_dis_embed

Could you mention somewhere that CLIP is used to embed the image into 512d, and QDHF learns to embed 512d into 2d?

Should CLIP be using ViT-B/16 or ViT-B/32? The QDHF paper mentions ViT-B/16 but this tutorial uses ViT-B/32.

Overall, I think it looks great! My comments all center on fit_dis_embed because this is one of the key parts of the tutorial, so I am hoping to make the code as clear as possible to readers. Once you make these changes, I'll run the notebook on Colab again to get a "golden" version for the tutorials.

Hi @btjanaka, thanks for the edits and detailed comments! I have revised the code to resolve these issues. Here are the changes:

I revised the contrastive learning part and added more explanations and comments.
DisEmbed was referred to as distance embedding, but I have changed all occurrences to div_proj, i.e., diversity projection, to keep terms consistent with the paper.
The loss function is indeed equivalent to Eq.3 in the paper, and I've added more explanations.
I have added more descriptions on how DreamSim is used to source feedback.
I added a description of the model saying it's projecting 512-d to 2-d.
The most recent paper (not on arXiv yet) is using ViT-B/32. However, users may use other checkpoints depending on their needs and resources.

Besides, I also changed the organization a little bit to make things clearer.

Again, thanks for these insightful suggestions, and let me know if you have other comments!

icaros-usc / pyribs

Add QDHF tutorial #459

Description

TODO

Status