mlcommons / mobile_app_open

Mobile App Open
https://mlcommons.org/en/groups/inference-mobile/
Apache License 2.0
47 stars 23 forks source link

feat: added stable diffusion pipeline (WIP) #905

Closed RSMNYS closed 2 months ago

RSMNYS commented 2 months ago
github-actions[bot] commented 2 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

freedomtan commented 2 months ago
  1. Please also try to make the main.cc (https://github.com/mlcommons/mobile_app_open/blob/master/flutter/cpp/binary/main.cc) work
  2. fp16/dynamic range quant models might be faster, https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/blob/main/convert_to_tflite_models_with_dynamic_range.py

we have the keras diffusion/unet model in https://github.com/mlcommons/mobile_model_closed/releases/tag/alpha, concatenate the two into one. It should be trivial to load into the stable diffusion pipeline.diffusion_model

for running tflite with one diffusion/unet model, see my previous example (https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/blob/main/text_to_image_with_tflite_models_from_huggingface.ipynb)

anhappdev commented 2 months ago

@RSMNYS You can test if your implementation work on desktop by running this cmd after updating the paths:

bazel build -c opt --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 //flutter/cpp/binary:main //mobile_back_tflite:tflitebackend

bazel-bin/flutter/cpp/binary/main EXTERNAL stable_diffusion \
--mode=PerformanceOnly \
--output_dir="output" \
--model_file="/Users/anh/Downloads/stable-diffusion-android/mobile_model_closed" \
--lib_path="bazel-bin/mobile_back_tflite/cpp/backend_tflite/libtflitebackend.so" \
--input_tfrecord="mobile_back_apple/dev-resources/stable_diffusion/coco_gen_full.tfrecord" \
--input_clip_model="mobile_back_apple/dev-resources/stable_diffusion/clip_model_512x512.tflite"
RSMNYS commented 2 months ago
"mobile_back_apple/dev-resources/stable_diffusion/coco_gen_full.tfrecord" \

@anhappdev can you please provide these resources:

"mobile_back_apple/dev-resources/stable_diffusion/coco_gen_full.tfrecord" "mobile_back_apple/dev-resources/stable_diffusion/clip_model_512x512.tflite"

RSMNYS commented 2 months ago

found here: https://drive.google.com/drive/folders/10zCF7_ctUIM7xVPyhw5P60utnvWVUcSy?usp=sharing Thanks

RSMNYS commented 2 months ago

found here: https://drive.google.com/drive/folders/10zCF7_ctUIM7xVPyhw5P60utnvWVUcSy?usp=sharing Thanks

@anhappdev what does the coco_gen_full.tfrecord contain? The tokenized prompts?

anhappdev commented 2 months ago

@anhappdev what does the coco_gen_full.tfrecord contain? The tokenized prompts?

@RSMNYS You can find the description here: https://github.com/mlcommons/mobile_app_open/blob/239f92c615dd36eaba25198cd49ee5c8fbf197a2/flutter/cpp/datasets/coco_gen_utils/generate_tfrecords.py#L60-L72

RSMNYS commented 2 months ago

@freedomtan @anhappdev for the stable diffusion process we need unconditional tokens. I guess we need to get them from the clip model we used to get the encoded prompts? If yes, we need to include them to the TF record?

RSMNYS commented 2 months ago

@freedomtan @anhappdev for the stable diffusion process we need unconditional tokens. I guess we need to get them from the clip model we used to get the encoded prompts? If yes, we need to include them to the TF record?

actually those can be generated on the fly as I see. 49406 49407 with the max allowed size.

freedomtan commented 2 months ago

@freedomtan @anhappdev for the stable diffusion process we need unconditional tokens. I guess we need to get them from the clip model we used to get the encoded prompts? If yes, we need to include them to the TF record?

The unconditional context is the output of the text encoder.

RSMNYS commented 2 months ago

@freedomtan @anhappdev for the stable diffusion process we need unconditional tokens. I guess we need to get them from the clip model we used to get the encoded prompts? If yes, we need to include them to the TF record?

The unconditional context is the output of the text encoder.

I was talking about the unconditional tokens. Which we then pass to text encoder to get the unconditional context. But as I've mentioned above, those could be generated on the fly [49406 49407..... 49407]

RSMNYS commented 2 months ago

@freedomtan @anhappdev Guys, the flow is working, at least tested on the desktop. Work that should be done:

  1. continue with the mobile part;
  2. Compose 2 sd models into 1;
  3. Adjust stable diffusion process to use one sd model;
sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed

Issues
41 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

anhappdev commented 2 months ago

Confirmed that SD pipeline works with the main.cc now. I will merge this. For new works, please open new PR.

freedomtan commented 2 months ago

for dynamic range quant, https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/blob/main/text_to_image_with_tflite_models_from_huggingface.ipynb, could be used for test (that's v1.4 with dynamic range quantization)