migraphx-benchmark / AMDMIGraphX

AMD's graph optimization engine.
MIT License
0 stars 1 forks source link

Investigate sdxl example #182

Open attila-dusnoki-htec opened 3 months ago

attila-dusnoki-htec commented 3 months ago

Improved version of sdxl is at https://github.com/ROCm/AMDMIGraphX/commits/sdxl_perf_torch_buffers/

The main idea was to move the buffers to gpu memory. This requires rocm/pytorch to make device="cuda" work.

migraphx supports argument_from_pointer, which can handle tensor.data_ptr() with the proper shape.

Note: for unetxl, the unetxl.opt version was used, which is created by tensorrt demo script.

The original and rewritten perf logs:

Elapsed time for decode: 440.0491 ms
Elapsed time clip: 37.4158 ms
Elapsed time unet: 8252.2065 ms
Elapsed time vae: 440.0772 ms
Elapsed time for run: 8752.8331 ms
Toggle output image ![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/bb0740eb-5593-43aa-83a7-3bccfc08a7ce)
Elapsed time for decode: 434.3943 ms
Elapsed time clip: 24.1256 ms
Elapsed time unet: 7470.2498 ms
Elapsed time vae: 434.4229 ms
Elapsed time for run: 7951.7439 ms
Toggle output image ![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/684b7039-82a7-48c9-bfa5-e0a465dd2620)

There are differences on the output images probably due to precision

attila-dusnoki-htec commented 3 months ago

The packages used in TRT demo: cuda -> hip cudart -> cudart (with hip) polygraphy -> Can be extended with MGX backend tensorrt -> migraphx

As seen, hip-python-as-cuda could work for the cuda part. The tensorrt has to be replaced, or wrapped.

attila-dusnoki-htec commented 3 months ago

To get the clip.opt and clip2.opt models working, we need to use graph surgeon. The hidden states are not exposed by default. The correspoding code is here.

Update: Actually, that is already in the model. The problem is that it is not "exposed" as an output. We need to re-export it and make sure it is an output.

attila-dusnoki-htec commented 3 months ago

The commit that enabled it: https://github.com/ROCm/AMDMIGraphX/commit/0d9e4b94b5e710e0a48ca4eaac288fbe80ab24d1

The "hidden_states" was just renamed, but was not added to the onnx outputs. With clip_modifier.py, we are creating a "mod" (modified) version. After fixing the dtypes, the new runtimes:

before after
numpy 37.4158 ms 16.2879 ms
torch 23.5778 ms 14.2189 ms

There is a change in the outputs as well. Also, now the "third" arm of the np version is fixed.

Toggle NP version output ![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/9d6b1bc4-7ce5-41b6-8e58-6fd968b79868)
Toggle PT version output ![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/ca04513e-1bbe-45b2-adbb-2b7d6482be8d)
attila-dusnoki-htec commented 2 months ago

Both SD21 and SDXL were updated to use torch. And Turbo was enabled as well.

Still debugging why the refiner gives strange results for certain models.

attila-dusnoki-htec commented 2 months ago

Related PRs: https://github.com/ROCm/AMDMIGraphX/pull/2951 https://github.com/ROCm/AMDMIGraphX/pull/2954 https://github.com/ROCm/AMDMIGraphX/pull/2959

attila-dusnoki-htec commented 2 months ago

Prompting SDXL

The following are some experiments with SDXL


The SDXL example code

The command to start the server: python gradio_app.py -p "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" --pipeline-type sdxl-opt --use-refiner --fp16 clip clip2 unetxl refiner_clip2 refiner_unetxl

It uses the sdxl-opt version, with fp16 model quatization (except vae).

Random examples

|variable|value| |:-:|:-:| |Prompt|Duck smoking cigarette, sepia colors, noir style, detailed, 8k| |Negative prompt|| |Number of steps|30| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5| ![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/6ea302d0-3af4-437d-a189-129d5623c1f5)
|variable|value| |:-:|:-:| |Prompt|portrait of a pretty blonde woman, a flower crown,
earthy makeup, flowing maxi dress with colorful patterns and fringe,
a sunset or nature scene, green and gold color scheme| |Negative prompt|| |Number of steps|50| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5|
|variable|value| |:-:|:-:| |Prompt|Black and white street photography of a rainy
night in New York, reflections on wet pavement.| |Negative prompt|| |Number of steps|100| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5|

Duck with fedora

The following examples all have the same values:

variable value
Negative prompt
Number of steps 100
Random seed 42
Guidance scale 5
Number of refiner steps 0
Aesthetic score 6
Negative Aesthetic score 2.5
Prompt Result
Duck with fedora Image
Duck with fedora, sepia color Image
Duck with fedora, sepia color, noir style Image
Duck with fedora, sepia color, noir style, detailed, 8k Image
Detailed portrait of a duck with fedora wearing an elegant suit, sepia colors, noir art style, 50s background Image
Detailed portrait of a duck with fedora wearing an elegant suit, bright colors, noir art style, 50s background Image
Detailed portrait of a detective duck with fedora wearing an elegant suit, bright colors, noir art style, 50s background Image
Detailed portrait of a detective duck with fedora wearing an elegant suit, black and white colors, noir art style, 50s background Image
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, dark colors, noir art style, rainy street at night with lamp lights background Image
The following 3 is with 50 steps instead of 100 Prompt Result
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, vibrant colors, noir art style, rainy street at night with lamp lights background Image
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, monochrome, noir art style, rainy street at night with lamp lights background Image
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, monochrome, comic book art style, rainy street at night with lamp lights background Image
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, vibrant colors, comic book art style, rainy street at night with lamp lights background Image

Timesteps montage

The following images are with the same prompt at different timesteps

Prompt: Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, dark colors, noir art style, rainy street at night with lamp lights background

Image 5 Image 10 Image 15
Image 20 Image 25 Image 30
Image 35 Image 40 Image 45
Image 50 Image 75 Image 100
attila-dusnoki-htec commented 2 months ago

Extended it with stream and events: https://github.com/ROCm/AMDMIGraphX/pull/3051