Open enzyme69 opened 2 years ago
This was actually when I tried running img2img for the first time. It never worked. I did supply the init_img.png
On the other computer (intel mac), it also crashes:
(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:288:0: error: the result shape is not compatible with the input shape
(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:288:0: note: see current operation: %3 = "mps.reshape"(%1, %2) : (tensor<8454272xf32>, tensor<5xsi32>) -> tensor<1x128x257x32x8xf32>
Segmentation fault: 11
Are you on NHWC branch? I didn't tweak img2img with NHWC yet. main branch should work on M1s.
If you pull the latest on liu/nhwc
, it should work for img2img now. Did necessary tweak to make it work.
It did not work on both, I tried it first on the master with M1 machine, but it keeps on complaining, "trace trap". Can you make a video demo maybe I missed something.
I have the model in one folder, and the output will be in the same folder as the model.
I know the example script is all in different folder. So what I did was just to replace txt2img line into img2img, with source image in the same folder as main.swift of img2img, but still not working, I don't know why.
Can you have example script for img2img? Thanks.
One thing to keep in mind, where the main.swift
file is carries no significance at all.
For usage example:
I will put init_img.png
, with exactly 512x512 size (this is important as I don't resize in the script at all, again, these scripts are more like demos) under /Users/administrator/workspace/swift-diffusion
. This is the same directory where sd-v1.4.ckpt
resides. Then, I can run:
bazel run examples:img2img --compilation_mode=opt -- /Users/administrator/workspace/swift-diffusion "horse riding with mid-century armors, intrinsic details, volumentric lighting"
If this still crashes, try to run:
bazel run examples:img2img --compilation_mode=dbg -- /Users/administrator/workspace/swift-diffusion "horse riding with mid-century armors, intrinsic details, volumentric lighting"
--compliation_mode=dbg
will give more error messages than the trace trap you saw.
So two things important here:
That's it, I missed STEP ONE! I originally put the init_img.png under img2img folder.
However every now and then, I still did get crashing "zsh: trace trap bazel run examples:img2img --compilation_mode=opt -- " --> safety feature? Can I remove this?
There is no safety features. Probably something to do with NaN's. I haven't fully figured out, but sometimes it will end up with some NaN's and need to re-run. Probably from seeding, or from fp16 compute. You can switch back to FP32 to see if the error gone. (Note this doesn't happen to me with NVIDIA card with the same program (swift-diffusion runs with CUDA too)).
I tried Float32, but somewhat the calculation become really long time, the final result looking amazing however (not sure if changing the Float affect the output of stable diffusion).
So my trick is simply to make .script file with 1000 lines of bazel swift diffusion command to do batch output.
So the NaN issue still happen time to time again.
Is it only reproducible with Intel or M1? Is it still reproducible with Float32? Is it only for img2img not txt2img? I am very interested in reprod and fix this.
I use the M1 more, because it's faster, 40 seconds to generate output. The Intel one is too slow 10 minutes per image if lucky.
Let say I make 100 batch overnight, just a few will give NaN and move to the next one.
I usually generate image + seed number (random), the one that crashes will escape/produce none. Maybe I could print the seed? and give you prompt as well to check why this happens?
This is with both txt2txt and txt2img.
As with Float32 --> I change it back to Float16 because somewhat FLoat32 makes the process so much longer.
If possible can we have updated script that calculate how long to run the process? Usually the way I check is by looking at the date the image created.
Yeah, if you did the following modification, it would generate a "runtime-data-seed.dbg" file upon crash (in the same directory as the model), and this will give me a good initial point for txt2img debug:
https://github.com/liuliu/swift-diffusion/compare/liu/nhwc...liu/with-data?expand=1
@liuliu Do I need to add the following modification myself (never done it before) or just for now, if I am up to date, maybe just generate this file anyway?
I am just back to this today and seems like the repo is updating itself (?).
Okey I got something, bad seed causing crash:
bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Documents/swift-diffusion-main/model "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
4002390524
Total time 58.75917601585388
INFO: Invocation ID: 6dfe3f3f-36ef-470c-9c86-806a82383b0c
INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples:txt2img up-to-date:
bazel-bin/examples/txt2img
INFO: Elapsed time: 0.324s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
**1282314562**
Total time 62.87813401222229
RUNME.txt: line 8: 58977 Trace/BPT trap: 5
bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Documents/swift-diffusion-main/model "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"
INFO: Invocation ID: f9865315-9e74-4d27-a8ce-6e1f4c5e5b0e
INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples:txt2img up-to-date:
bazel-bin/examples/txt2img
INFO: Elapsed time: 0.420s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
2989718031
Should be able to just checking out the liu/with-data
branch maybe?
Okey I got something, bad seed causing crash:
bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Documents/swift-diffusion-main/model "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action 4002390524 Total time 58.75917601585388 INFO: Invocation ID: 6dfe3f3f-36ef-470c-9c86-806a82383b0c INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //examples:txt2img up-to-date: bazel-bin/examples/txt2img INFO: Elapsed time: 0.324s, Critical Path: 0.00s INFO: 1 process: 1 internal. INFO: Build completed successfully, 1 total action INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action **1282314562** Total time 62.87813401222229 RUNME.txt: line 8: 58977 Trace/BPT trap: 5 bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Documents/swift-diffusion-main/model "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws" INFO: Invocation ID: f9865315-9e74-4d27-a8ce-6e1f4c5e5b0e INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //examples:txt2img up-to-date: bazel-bin/examples/txt2img INFO: Elapsed time: 0.420s, Critical Path: 0.01s INFO: 1 process: 1 internal. INFO: Build completed successfully, 1 total action INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action 2989718031
Thanks, let me check tomorrow!
I cannot reprod locally on my M1 Mac Mini machine, here is the change I made against bb919dbd8b8914c3f153b84dfdd66f4101d3e2d0:
diff --git a/examples/txt2img/main.swift b/examples/txt2img/main.swift
index b23b20c..7973eca 100644
--- a/examples/txt2img/main.swift
+++ b/examples/txt2img/main.swift
@@ -32,7 +32,7 @@ extension DiffusionModel {
}
}
-DynamicGraph.setSeed(40)
+DynamicGraph.setSeed(1282314562)
DynamicGraph.memoryEfficient = true
let unconditionalGuidanceScale: Float = 7.5
@@ -126,6 +126,7 @@ graph.withNoGrad {
DynamicGraph.setProfiler(true)
// Now do PLMS sampling.
for i in 0..<model.steps {
+ print("step \(i)")
let timestep = model.timesteps - model.timesteps / model.steps * (i + 1) + 1
let t = graph.variable(Tensor<UseFloatingPoint>(from: ts[i]))
let tNext = Tensor<UseFloatingPoint>(from: ts[min(i + 1, ts.count - 1)])
Here is the command I use to run:
bazel run examples:txt2img --compilation_mode=dbg --run_under=lldb -- /Users/administrator/workspace/swift-diffusion "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"
Hi Liu, the Trace Trap still happenings...
Target //examples:txt2img up-to-date:
bazel-bin/examples/txt2img
INFO: Elapsed time: 276.743s, Critical Path: 233.81s
INFO: 257 processes: 52 internal, 198 darwin-sandbox, 7 worker.
INFO: Build completed successfully, 257 total actions
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Downloads/swift-diffusion-main/modeINFO: Build completed successfully, 257 total actions
Total time 60.565213084220886
(base) blendersushi@192-168-1-101 swift-diffusion-main % bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Downloads/swift-diffusion-main/model/ "a photograph of a tiny astronaut riding a giant"
INFO: Invocation ID: 1e5ebcff-b7d3-4e52-bea3-840c76029705
INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples:txt2img up-to-date:
bazel-bin/examples/txt2img
INFO: Elapsed time: 4.404s, Critical Path: 3.97s
INFO: 4 processes: 1 internal, 2 darwin-sandbox, 1 worker.
INFO: Build completed successfully, 4 total actions
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Downloads/swift-diffusion-main/model/ INFO: Build completed successfully, 4 total actions
1887431285
Total time 57.43638300895691
zsh: trace trap bazel run examples:txt2img --compilation_mode=opt --
(base) blendersushi@192-168-1-101 swift-diffusion-main %
OK. I encountered similar problems on iPad, but not on M1. I am pretty confident it is NaN somewhere but I am not sure where is the source. Need to dig deeper.
Occasionally, I noticed that I am not getting result and getting "trace trap"? Is there like safety filter with Swift Diffusion? Can I just turn it off because it's local anyway.