Open attila-dusnoki-htec opened 3 months ago
The packages used in TRT demo: cuda -> hip cudart -> cudart (with hip) polygraphy -> Can be extended with MGX backend tensorrt -> migraphx
As seen, hip-python-as-cuda
could work for the cuda part.
The tensorrt has to be replaced, or wrapped.
To get the clip.opt and clip2.opt models working, we need to use graph surgeon. The hidden states are not exposed by default. The correspoding code is here.
Update: Actually, that is already in the model. The problem is that it is not "exposed" as an output. We need to re-export it and make sure it is an output.
The commit that enabled it: https://github.com/ROCm/AMDMIGraphX/commit/0d9e4b94b5e710e0a48ca4eaac288fbe80ab24d1
The "hidden_states" was just renamed, but was not added to the onnx outputs. With clip_modifier.py, we are creating a "mod" (modified) version. After fixing the dtypes, the new runtimes:
before | after | |
---|---|---|
numpy | 37.4158 ms | 16.2879 ms |
torch | 23.5778 ms | 14.2189 ms |
There is a change in the outputs as well. Also, now the "third" arm of the np version is fixed.
Both SD21 and SDXL were updated to use torch. And Turbo was enabled as well.
Still debugging why the refiner gives strange results for certain models.
The following are some experiments with SDXL
The command to start the server: python gradio_app.py -p "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" --pipeline-type sdxl-opt --use-refiner --fp16 clip clip2 unetxl refiner_clip2 refiner_unetxl
It uses the sdxl-opt version, with fp16 model quatization (except vae).
|variable|value| |:-:|:-:| |Prompt|Duck smoking cigarette, sepia colors, noir style, detailed, 8k| |Negative prompt|| |Number of steps|30| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5| | ![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/6ea302d0-3af4-437d-a189-129d5623c1f5) |
|variable|value|
|:-:|:-:|
|Prompt|portrait of a pretty blonde woman, a flower crown, earthy makeup, flowing maxi dress with colorful patterns and fringe, a sunset or nature scene, green and gold color scheme| |Negative prompt|| |Number of steps|50| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5| |
![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/0d7a5c68-9366-4a95-a8cf-025bba02dd56) |
|variable|value|
|:-:|:-:|
|Prompt|Black and white street photography of a rainy night in New York, reflections on wet pavement.| |Negative prompt|| |Number of steps|100| |Random seed|42| |Guidance scale|5| |Number of refiner steps|0| |Aesthetic score|6| |Negative Aesthetic score|2.5| |
![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/b95d3e6f-2e9f-4f32-be95-fe350f96aa45) |
The following examples all have the same values:
variable | value |
---|---|
Prompt | |
Negative prompt | |
Number of steps | 100 |
Random seed | 42 |
Guidance scale | 5 |
Number of refiner steps | 0 |
Aesthetic score | 6 |
Negative Aesthetic score | 2.5 |
Prompt | Result |
---|---|
Duck with fedora | |
Duck with fedora, sepia color | |
Duck with fedora, sepia color, noir style | |
Duck with fedora, sepia color, noir style, detailed, 8k | |
Detailed portrait of a duck with fedora wearing an elegant suit, sepia colors, noir art style, 50s background | |
Detailed portrait of a duck with fedora wearing an elegant suit, bright colors, noir art style, 50s background | |
Detailed portrait of a detective duck with fedora wearing an elegant suit, bright colors, noir art style, 50s background | |
Detailed portrait of a detective duck with fedora wearing an elegant suit, black and white colors, noir art style, 50s background | |
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, dark colors, noir art style, rainy street at night with lamp lights background |
The following 3 is with 50 steps instead of 100 |
Prompt | Result |
---|---|---|
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, vibrant colors, noir art style, rainy street at night with lamp lights background | ||
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, monochrome, noir art style, rainy street at night with lamp lights background | ||
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, monochrome, comic book art style, rainy street at night with lamp lights background | ||
Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, vibrant colors, comic book art style, rainy street at night with lamp lights background |
The following images are with the same prompt at different timesteps
Prompt: Detailed portrait of a detective duck with fedora wearing an elegant 50s style suit, dark colors, noir art style, rainy street at night with lamp lights background
Extended it with stream and events: https://github.com/ROCm/AMDMIGraphX/pull/3051
Improved version of sdxl is at https://github.com/ROCm/AMDMIGraphX/commits/sdxl_perf_torch_buffers/
The main idea was to move the buffers to gpu memory. This requires
rocm/pytorch
to makedevice="cuda"
work.migraphx supports
argument_from_pointer
, which can handletensor.data_ptr()
with the proper shape.Note: for
unetxl
, theunetxl.opt
version was used, which is created by tensorrt demo script.The original and rewritten perf logs:
Toggle output image
![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/bb0740eb-5593-43aa-83a7-3bccfc08a7ce)Toggle output image
![Image](https://github.com/migraphx-benchmark/AMDMIGraphX/assets/126579622/684b7039-82a7-48c9-bfa5-e0a465dd2620)There are differences on the output images probably due to precision