vitoplantamura / OnnxStream

Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK.
https://yolo.vitoplantamura.com/
Other
1.85k stars 83 forks source link

Pi Zero 2w 32bit build of xnnpack failing #64

Closed BranchEnterprise closed 7 months ago

BranchEnterprise commented 7 months ago

/tmp/ccFr850C.s:9961: Error: selected processor does not support `vudot.u8 q4,q8,d4[0]' in ARM mode... <--similar lines are repeated multiple times. researching the web, some possible problems could be things like bugs in gcc or problems with my version of binutils. im summary, im getting that there is a conflict with Binutils >2.33 coupled with gcc versions 9-11.

ive tried your git checkout for xnnpack and the one referenced here for pi zero 2: https://github.com/vitoplantamura/OnnxStream/issues/6#issuecomment-1650325073 here is a similar example of my failure: https://github.com/google/XNNPACK/issues/1465. i have a recent os flashed from rpi imager (debian bullseye (legacy) 2024-3-15). its been upgraded via sudo apt-get upgrade.

i will try a prebuilt xnnpack for my architecture. something from here ---> https://mirrors.ptisp.pt/debian/pool/main/x/xnnpack/ (libxnnpack-dev_0.0~git20201031.beca652+really.git20200323.1b35463-2_armhf.deb. the main reason ive been hesitating is because it may conflict with the ideal checkout mentioned in your readme as follows: git clone https://github.com/google/XNNPACK.git cd XNNPACK git checkout 579de32260742a24166ecd13213d2e60af862675 mkdir build cd build cmake -DXNNPACK_BUILD_TESTS=OFF -DXNNPACK_BUILD_BENCHMARKS=OFF .. cmake --build . --config Release

before i go through trying the prebuilds, perhaps the issue could be resolved with a change in make instructions? my os is also fresh enough where i dont mind doing a fresh re-installI can hold off on sudo apt-get update until i try the checkouts listed here again. NOTE: i meant im using the legacy bullseye w/desktop rather than bookworm. the imager recommends it for conflicts with realVnc and so on.

vitoplantamura commented 7 months ago

hi,

Building 32-bit XNNPACK/OnnxStream is something I've never done and never seen here in the issues, so this is uncharted territory...

Yes, unfortunately it is not possible to use a prebuilt version of XNNPACK, as OnnxStream requires exactly that specific commit referenced in the "git checkout" command.

I would try building the latest version of XNNPACK in the master branch (i.e. without running my "git checkout" command), in order to see if this is a problem that has been resolved.

Vito

BranchEnterprise commented 7 months ago

I installed a prebuilt XNNPACK from 2020, but thats as far as I went. it became clear that i would need to clone the xnnpack git repo anyway so i could set up the XNNPACK path flag as you instructed. the main conflict for installing the newer XNNPACK prebuild was needing libc6 > 3.4. 32 bit bookworm fullfills this requirement.

Ill give it a go with a fresh bookworm 32 bit install and report back. it may help some who expect to use an OS with working remote desktop/gui . also, raspis main site is still recommending legacy 32bit bullseye within their imaging software for models pi zero, 1,2 and 3 (due to software conflicts).installing non-lite 64 bit on pi02 is almost unuseably slow (i tried). having some success with XNNPACK prebuilds gives me hope. should the ONNXSTREAM build also fail on 32 bit bookworm, i'll make it known. ill reuse the instructions you posted, including the git checkout.

edit -->attempted to build XNNPACK without checkout (legacy bullseye 32 bit os) gave errors like " CMake Error at CMakeLists.txt:588 (ADD_LIBRARY): No SOURCES given to target: microkernels-all CMake Error at CMakeLists.txt:603 (ADD_LIBRARY): No SOURCES given to target: memory CMake Error at CMakeLists.txt:608 (ADD_LIBRARY): No SOURCES given to target: operator-utils CMake Error at CMakeLists.txt:604 (ADD_LIBRARY): No SOURCES given to target: microkernel-utils

CMake Generate step failed. Build files cannot be regenerated correctly. gmake: *** No targets specified and no makefile found. Stop."

it seems weird like i messed something up. i merely copied your instructions without the git checkout line.

can't wait! much thanks

BranchEnterprise commented 7 months ago

hello again!

i've successfully built XNNPACK (with your git checkout) and ONNXSTREAM on 32 bit bookworm OS non-lite (the one with integrated desktop). i have done this without sudo apt ugrade for now. i am looking to try rebuilding after sudo apt upgrade later. However, I am trying to run SD 1.5 using your weights. your readme directed me here: https://github.com/vitoplantamura/OnnxStream/releases/download/v0.1/StableDiffusion-OnnxStream-Windows-x64-with-weights.rar. it seems to be a windows specific file, but i've attempted to extract it nonetheless. extraction yielded many errors.

i am simply trying to gather the setup you used for the 512x512 SD 1.5 images.you've succesfully generated images on the rpi02. "...The third image was generated by my RPI Zero 2 in about 3 hours 1.5 hours (using the MAX_SPEED option when compiling)". i seem to be missing necessary files on extraction (attempted with 7z-full and archiver): ERROR: Unsupported Method : SD/vae_decoder_qu8/onnx_3A3A_Mul_5F_882.bin ERROR: Unsupported Method : SD/vae_decoder_qu8/onnx_3A3A_Mul_5F_890.bin ERROR: Unsupported Method : SD/vae_decoder_qu8/onnx_3A3A_Reshape_5F_251.bin ERROR: Unsupported Method : SD/vae_decoder_qu8/onnx_3A3A_Reshape_5F_825.bin ERROR: Unsupported Method : SD/vae_decoder_qu8/onnx_3A__3A_Reshape_5F_838.bin ERROR: Unsupported Method : SD/vae_decoder_qu8/range_data.txt Sub items Errors: 1053 Archives with Errors: 1 Sub items Errors: 1053

this is my attempt to run a 3-step stable diffusion example:

wourldd@wourldd:~/OnnxStream/src/build $ ls CMakeCache.txt CMakeFiles cmake_install.cmake Makefile sd text_encoder_fp32 tokenizer unet_fp16 vae_decoder_fp16 vae_decoder_qu8 wourldd@wourldd:~/OnnxStream/src/build $ ./sd --rpi --steps 3 --output result1.p ng ----------------[start]------------------ positive_prompt: a photo of an astronaut riding a horse on mars negative_prompt: ugly, blurry output_png_path: result1.png steps: 3 seed: 721920 ----------------[prompt]------------------ WARNING: The merges.txt file is missing from the tokenizer folder. Running without byte pair encoding results in subpar tokenization. The file can be downloaded here: https://huggingface.co/AeroX2/stable-diffusion-xl-turbo-1.0-onnxstream/blob/main /sdxl_tokenizer/merges.txt Token: "a" Warning token: "a" was ignored Token: "photo" Warning token: "photo" was ignored Token: "of" Warning token: "of" was ignored Token: "an" Warning token: "an" was ignored Token: "astronaut" Warning token: "astronaut" was ignored Token: "riding" Warning token: "riding" was ignored Token: "a" Warning token: "a" was ignored Token: "horse" Warning token: "horse" was ignored Token: "on" Warning token: "on" was ignored Token: "mars" Warning token: "mars" was ignored Token: "" Warning token: "" was ignored === ERROR === read_file: invalid size of file.

am i missing a crucial step or set of files here? even adding the merges.txt as instructed in the warning runs, but with no image generation: ...Warning token: "m" was ignored Token: "a" Warning token: "a" was ignored Token: "r" Warning token: "r" was ignored Token: "s" Warning token: "s" was ignored === ERROR === read_file: invalid size of file.

vitoplantamura commented 7 months ago

awesome, it seems to work!

The error you are getting now is because the archive was not extracted correctly.

Please note that this is a RAR file, not a ZIP file.

Vito

BranchEnterprise commented 7 months ago

thanks for getting back so soon. i extracted the weights on a windows os and transferred to pi via ftp--now running an example --rpi-lowmem --steps 3 with swap just in case. So its known, regular --rpi hung on diffusion for 1hr+ with no swap. i cancelled the process before it finished.

I should know within a few hours if it has successfully generated an image. let me know if i should close this thereafter. I also planned on a build AFTER updating via sudo apt update to see if it breaks. i would first flash a fresh 32 bit bookworm install. that would take a few hours to setup.

followup on results for 3 step generation ---> succesfull, taking 26.7668 minutes for a --steps 3 image --rpi-lowmem (pi zero 2w bookworm 32bit full, no full system library upgrade) .

BranchEnterprise commented 7 months ago

now confirming that a fully updated, fresh install (sudo apt-get update, sudo apt-ge upgrade) compiles both XNNPACK and ONNXSTREAM also. the OS is bookworm 32 bit -- Release date: March 15th 2024

vitoplantamura commented 7 months ago

Thanks for your feedback!

Vito