attila-dusnoki-htec commented 3 months ago

Improved version of sdxl is at https://github.com/ROCm/AMDMIGraphX/commits/sdxl_perf_torch_buffers/

The main idea was to move the buffers to gpu memory. This requires rocm/pytorch to make device="cuda" work.

migraphx supports argument_from_pointer, which can handle tensor.data_ptr() with the proper shape.

Note: for unetxl, the unetxl.opt version was used, which is created by tensorrt demo script.

The original and rewritten perf logs:

original (np): sdxl_perf.log

Elapsed time for decode: 440.0491 ms
Elapsed time clip: 37.4158 ms
Elapsed time unet: 8252.2065 ms
Elapsed time vae: 440.0772 ms
Elapsed time for run: 8752.8331 ms

Toggle output image

![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/bb0740eb-5593-43aa-83a7-3bccfc08a7ce)

new (pt): sdxl_torch_perf.log

Elapsed time for decode: 434.3943 ms
Elapsed time clip: 24.1256 ms
Elapsed time unet: 7470.2498 ms
Elapsed time vae: 434.4229 ms
Elapsed time for run: 7951.7439 ms

Toggle output image

![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/684b7039-82a7-48c9-bfa5-e0a465dd2620)

There are differences on the output images probably due to precision

attila-dusnoki-htec commented 3 months ago

The packages used in TRT demo: cuda -> hip cudart -> cudart (with hip) polygraphy -> Can be extended with MGX backend tensorrt -> migraphx

As seen, hip-python-as-cuda could work for the cuda part. The tensorrt has to be replaced, or wrapped.

attila-dusnoki-htec commented 3 months ago

To get the clip.opt and clip2.opt models working, we need to use graph surgeon. The hidden states are not exposed by default. The correspoding code is here.

Update: Actually, that is already in the model. The problem is that it is not "exposed" as an output. We need to re-export it and make sure it is an output.

attila-dusnoki-htec commented 3 months ago

The commit that enabled it: https://github.com/ROCm/AMDMIGraphX/commit/0d9e4b94b5e710e0a48ca4eaac288fbe80ab24d1

The "hidden_states" was just renamed, but was not added to the onnx outputs. With clip_modifier.py, we are creating a "mod" (modified) version. After fixing the dtypes, the new runtimes:

before	after
numpy	37.4158 ms	16.2879 ms
torch	23.5778 ms	14.2189 ms

There is a change in the outputs as well. Also, now the "third" arm of the np version is fixed.

Toggle NP version output

![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/9d6b1bc4-7ce5-41b6-8e58-6fd968b79868)

Toggle PT version output

![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/ca04513e-1bbe-45b2-adbb-2b7d6482be8d)

attila-dusnoki-htec commented 2 months ago

Both SD21 and SDXL were updated to use torch. And Turbo was enabled as well.

Still debugging why the refiner gives strange results for certain models.

attila-dusnoki-htec commented 2 months ago

Prompting SDXL

The following are some experiments with SDXL

Setup

The SDXL example code

The command to start the server: python gradio_app.py -p "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" --pipeline-type sdxl-opt --use-refiner --fp16 clip clip2 unetxl refiner_clip2 refiner_unetxl

It uses the sdxl-opt version, with fp16 model quatization (except vae).

Random examples

|variable|value| |:-:|:-:| |Prompt|Duck smoking cigarette, sepia colors, noir style, detailed, 8k| |Negative prompt|| |Number of steps|30| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5|

![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/6ea302d0-3af4-437d-a189-129d5623c1f5)

|variable|value| |:-:|:-:| |Prompt|portrait of a pretty blonde woman, a flower crown,
earthy makeup, flowing maxi dress with colorful patterns and fringe,
a sunset or nature scene, green and gold color scheme| |Negative prompt|| |Number of steps|50| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5|

![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/0d7a5c68-9366-4a95-a8cf-025bba02dd56)

|variable|value| |:-:|:-:| |Prompt|Black and white street photography of a rainy
night in New York, reflections on wet pavement.| |Negative prompt|| |Number of steps|100| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5|

![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/b95d3e6f-2e9f-4f32-be95-fe350f96aa45)

Duck with fedora

The following examples all have the same values:

variable	value
Prompt
Negative prompt
Number of steps	100
Random seed	42
Guidance scale	5
Number of refiner steps	0
Aesthetic score	6
Negative Aesthetic score	2.5

Prompt	Result
Duck with fedora
Duck with fedora, sepia color
Duck with fedora, sepia color, noir style
Duck with fedora, sepia color, noir style, detailed, 8k
Detailed portrait of a duck with fedora wearing an elegant suit, sepia colors, noir art style, 50s background
Detailed portrait of a duck with fedora wearing an elegant suit, bright colors, noir art style, 50s background
Detailed portrait of a detective duck with fedora wearing an elegant suit, bright colors, noir art style, 50s background
Detailed portrait of a detective duck with fedora wearing an elegant suit, black and white colors, noir art style, 50s background
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, dark colors, noir art style, rainy street at night with lamp lights background

The following 3 is with `50` steps instead of `100`	Prompt	Result
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, vibrant colors, noir art style, rainy street at night with lamp lights background
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, monochrome, noir art style, rainy street at night with lamp lights background
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, monochrome, comic book art style, rainy street at night with lamp lights background
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, vibrant colors, comic book art style, rainy street at night with lamp lights background

Timesteps montage

The following images are with the same prompt at different timesteps

Prompt: Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, dark colors, noir art style, rainy street at night with lamp lights background