openai / glide-text2im

GLIDE: a diffusion-based text-conditional image synthesis model
MIT License
3.53k stars 500 forks source link

GUIDE: Using GLIDE in pipenv instead of pip #25

Open Arcitec opened 2 years ago

Arcitec commented 2 years ago

I've created a guide for people who have moved on to the more advanced pipenv instead of basic pip.

  1. Install Pipenv and Pyenv on your system so that both tools are available. Pyenv is optional but is required for fetching the correct version of Python on-demand. But if you're reading this, you're probably a chad that has both tools already!

  2. Create a new, empty project folder. Then navigate to it in a terminal window.

  3. Run the following command to initialize a pipenv virtualenv with Python 3.9 (that's, as of this writing, the newest version that PyTorch supports).

pipenv install --python=3.9
  1. Now you need to choose command based on what GPU you have. Note that the command will take a VERY long time to install, because it will download multiple versions of the 1.7 GB large PyTorch archive (for me it took 30 minutes on a 100mbit connection, and downloaded 16 GB). This is PyTorch's fault for using a non-standard repository format and a non-standard package versioning system (the "+cu113" junk), which means that Pipenv has trouble figuring out what to do (since Pipenv only follows Python PEP standards for how repositories should look), so it grabs everything that matches the query. Which is all architectures... (If you're on Linux, just check du -h ~/.cache/pipenv and you'll see that it's downloading gigabytes of packages...)
pipenv install --extra-index-url https://download.pytorch.org/whl/cu113/ "torch==1.10.1+cu113"
pipenv install --extra-index-url https://download.pytorch.org/whl/ "torch==1.10.1+cu102"
pipenv install --extra-index-url https://download.pytorch.org/whl/ "torch==1.10.1+cpu"
  1. You also need to install Numpy and PyYAML.
pipenv install numpy pyyaml
  1. Next, clone the GLIDE library repo into a subfolder:
git clone https://github.com/openai/glide-text2im.git lib/glide-text2im
  1. Tell Pipenv to install the "local library" (GLIDE). This will automatically detect your Pipfile in the parent folder and will add it to your Pipfile too. Note that this command must be run from the directory that the Pipfile is in, because it will treat the -e <path> as a relative path from the current working dir. I suppose you could provide a full, absolute path. But we'll do relative here. Oh and this command takes a while since it downloads dependencies.
pipenv install -e ./lib/glide-text2im
  1. Create a subfolder for your own Python project files:
mkdir src && cd src
  1. Now simply create your Python files in src/, "import" the library as seen in GLIDE's examples, and have fun. You're able to use the code from the Notebook example files that GLIDE has provided.

  2. Running your code in the pipenv (virtualenv) must be done with a special command, so that it loads the Python version and virtualenv libraries that you've installed:

pipenv run python <yoursourcefilehere.py>
Arcitec commented 2 years ago

Small bonus guide: Converting the *.ipynb "Notebook" files to normal scripts.

  1. Install the necessary tools for the conversion. These have a lot of dependencies and take a few minutes.
pipenv install jupyter nbconvert
  1. Convert all notebooks to Python files. This must be executed from the top directory of your project (because if you run it inside the cloned git repo, it will treat it as a different project folder and would create another Pipfile instead):
cd ..  # return to parent folder (if you're still in the src/ folder)
pipenv run jupyter nbconvert --to script lib/glide-text2im/notebooks/*.ipynb
  1. Now you can move those .py files out of lib/glide-text2im/notebooks/ and into your src/ folder as a basis for your own project.
mv lib/glide-text2im/notebooks/*.py src/
  1. You have to edit the demos to remove the "IPython" stuff such as the "get_ipython" call and the image-display code and instead use something like OpenCV's image displayer to show the result, because the example code outputs the result to your "Notebook" (Jupyter etc). First install OpenCV2 instead.
pipenv install opencv-contrib-python
  1. Edit the demos to remove these lines:
get_ipython().system('pip install git+https://github.com/openai/glide-text2im')
from PIL import Image
from IPython.display import display
  1. Add this line:
import cv2
  1. Replace this line:

Old:

    display(Image.fromarray(reshaped.numpy()))

New:

    # Resize to 4x larger and display with OpenCV2
    img = reshaped.numpy()
    img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)  # Convert to OpenCV's BGR color order.
    img = cv2.resize(img, None, fx=4, fy=4, interpolation=cv2.INTER_LANCZOS4)
    cv2.imshow("Result", img)
    cv2.waitKey(0)  # Necessary for OS threading/rendering of the GUI.
    cv2.destroyAllWindows()
  1. Very important: When you see a generated image, you must press a keyboard key to close the window. Don't close it with the "X" with your mouse because that will hang python on "waitKey" since it will wait for a key that never arrives. Displaying images with cv2 is impossible without waitKey since the OS thinks the window is dead if you skip that. So your only option for this demo is to close the windows by pressing a keyboard key such as space!

PS: "GLIDE (filtered)" is definitely a fun toy, but the results are pretty bad, blurry and nonsensical (unrelated to what you wrote) with the public model unfortunately, as mentioned here:

https://github.com/openai/glide-text2im/issues/21#issuecomment-1045590329

Most of the output you're gonna get is useless. But some of it can be fun for inspiration/ideas for projects or art. The main benefit of this model is actually that it generates results extremely fast compared to previous CLIP-based generators.

I would honestly say that the old CLIP-based generators that are out there are much better and more usable. Sure, the coherence of the image itself and the objects is better in GLIDE (filtered), but it responds really poorly to your input most of the time.

If you decide that you want to use this project anyway ("GLIDE (filtered)"), I recommend the clip_guided code. It's better than text2im at understanding things with the limited free training data we've been given. See this topic: https://github.com/openai/glide-text2im/issues/19

The main issue with the free version of GLIDE is that the filtered training data seems to have been mostly "freaking dogs!!". Which may explain why the default prompt demo is "an oil painting of a corgi"... It also produces extremely blurry output.

Arcitec commented 2 years ago

Bonus: If someone wants a full, more detailed guide about installing PyTorch in Pipenv correctly, then you can find that guide here:

https://github.com/pypa/pipenv/issues/4961#issuecomment-1045679643

All relevant commands and most of the explanations from that guide are already here in this GLIDE guide, but if you want a deeper understanding of how Pipenv's 3rd party repo support works compared to Pip, you'll want to check out that guide too.