ml6team / fondant-usecase-controlnet

Example Fondant pipeline preparing data to train a Controlnet model
25 stars 3 forks source link
controlnet data-engineering fine-tuning fondant

ControlNet Interior Design Pipeline

Introduction

This example demonstrates an end-to-end fondant pipeline to collect and process data for the fine-tuning of a ControlNet model, focusing on images related to interior design.

The resulting model allows you to generate the room of your dreams:

Input image Output image
input image output image
input image output image

Want to try out the resulting model yourself, head over to our Hugging Face space!

Check out this doc for more information on ControlNet and how to use it: docs/controlnet.md.

Pipeline Overview

The image below shows the entire pipeline and its workflow. Note that this workflow is currently adapted to the interior design domain, but can be easily adapted to other domains by changing the prompt generation component.

Image

There are 5 components in total, these are:

  1. Prompt Generation: This component generates a set of seed prompts using a rule-based approach that combines various rooms and styles together, like “a photo of a {room_type} in the style of {style_type}”. As input, it takes in a list of room types (bedroom, kitchen, laundry room, ..), a list of room styles (contemporary, minimalist, art deco, ...) and a list of prefixes (comfortable, luxurious, simple). These lists can be easily adapted to other domains. The output of this component is a list of seed prompts.

  2. Image URL Retrieval: This component retrieves images from the LAION-5B dataset based on the seed prompts. The retrieval itself is done based on CLIP embeddings similarity between the prompt sentences and the captions in the LAION dataset. This component doesn’t return the actual images yet, only the URLs. The next component in the pipeline will then download these images.

  3. Download Images: This component downloads the actual images based on the URLs retrieved by the previous component. It takes in the URLs as input and returns the actual images, along with some metadata (like their height and width).

  4. Add Captions: This component captions all images using BLIP. This model takes in the image and generates a caption that describes the content of the image. This component takes in a Hugging Face model ID, so it can use any Hugging Face Hub model.

  5. Add Segmentation Maps: This component segments the images using the UPerNet model. Each segmentation map contains segments of 150 possible categories listed here.

Getting started

⚠️ Prerequisites:

  • A Python version between 3.8 and 3.11 installed on your system.
  • Docker installed and configured on your system.
  • A GPU is recommended to run the model-based components of the pipeline.

Cloning the repository

Clone this repository to your local machine using one of the following commands:

HTTPS

git clone https://github.com/ml6team/fondant-usecase-controlnet.git

SSH

git clone git@github.com:ml6team/fondant-usecase-controlnet.git

Installing the requirements

pip install -r requirements.txt

Confirm that Fondant has been installed correctly on your system by executing the following command:

fondant --help

Running the pipeline

There are two options to run the pipeline:

Train your own ControlNet model

Apparently, creating data for ControlNet fine-tuning is the most challenging part. However, Huggingface provides an easy way to fine-tune your own ControlNet model using the Diffusers library. After publishing your dataset, you can initiate a fine-tuning job and specify the Huggingface dataset you wish to use as training data.

Resources: