pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.1k stars 192 forks source link

`scripts/build_native.sh et` errors out #985

Open sunshinesfbay opened 1 month ago

sunshinesfbay commented 1 month ago

🐛 Describe the bug

I am trying to build the llama runner natively on a rasperry pi following the torchchat description, and the post at https://dev-discuss.pytorch.org/t/run-llama3-8b-on-a-raspberry-pi-5-with-executorch/2048

I was able to build executorch and torchchat so far (I can build a pte and run it with the python driver), but ran into an error with scripts/build_native et:

[ 22%] Performing download step (git clone) for 'fxdiv' Cloning into 'FXdiv-source'... Already on 'master' Your branch is up to date with 'origin/master'. [ 33%] Performing update step for 'fxdiv' -- Fetching latest from the remote origin [ 44%] No patch step for 'fxdiv' [ 55%] No configure step for 'fxdiv' [ 66%] No build step for 'fxdiv' [ 77%] No install step for 'fxdiv' [ 88%] No test step for 'fxdiv' [100%] Completed 'fxdiv' [100%] Built target fxdiv -- Using python executable '/home/sunshine/pt2/bin/python3' -- Resolved buck2 as /home/sunshine/torchchat/et-build/src/executorch/pip-out/temp.linux-aarch64-cpython-311/cmake-out/buck2-bin/buck2-49670bee56a7d8a7696409ca6fbf7551d2469787. -- Killing buck2 daemon -- executorch: Generating source lists -- executorch: Generating source file list /home/sunshine/torchchat/et-build/src/executorch/pip-out/temp.linux-aarch64-cpython-311/cmake-out/executorch_srcs.cmake Error while generating /home/sunshine/torchchat/et-build/src/executorch/pip-out/temp.linux-aarch64-cpython-311/cmake-out/executorch_srcs.cmake. Exit code: 1 Output:

Error: Traceback (most recent call last): File "/home/sunshine/torchchat/et-build/src/executorch/build/buck_util.py", line 26, in run cp: subprocess.CompletedProcess = subprocess.run( ^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/home/sunshine/torchchat/et-build/src/executorch/pip-out/temp.linux-aarch64-cpython-311/cmake-out/buck2-bin/buck2-49670bee56a7d8a7696409ca6fbf7551d2469787', 'cquery', "inputs(deps('//runtime/executor:program'))"]' returned non-zero exit status 2.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/sunshine/torchchat/et-build/src/executorch/build/extract_sources.py", line 218, in main() File "/home/sunshine/torchchat/et-build/src/executorch/build/extract_sources.py", line 203, in main target_to_srcs[name] = sorted(target.get_sources(graph, runner)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sunshine/torchchat/et-build/src/executorch/build/extract_sources.py", line 116, in get_sources sources: set[str] = set(runner.run(["cquery", query])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sunshine/torchchat/et-build/src/executorch/build/buck_util.py", line 31, in run raise RuntimeError(ex.stderr.decode("utf-8")) from ex RuntimeError: Command failed: Error validating working directory

Caused by: 0: Failed to stat /home/sunshine/torchchat/et-build/src/executorch/buck-out/v2 1: ENOENT: No such file or directory

CMake Error at build/Utils.cmake:191 (message): executorch: source list generation failed Call Stack (most recent call first): CMakeLists.txt:327 (extract_sources)

-- Configuring incomplete, errors occurred! error: command '/home/sunshine/pt2/bin/cmake' failed with exit code 1 error: subprocess-exited-with-error

Building wheel for executorch (pyproject.toml) did not run successfully. exit code: 1

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: /home/sunshine/pt2/bin/python3 /home/sunshine/pt2/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpvxjhev2t cwd: /home/sunshine/torchchat/et-build/src/executorch Building wheel for executorch (pyproject.toml) ... error ERROR: Failed building wheel for executorch Failed to build executorch ERROR: Could not build wheels for executorch, which is required to install pyproject.toml-based projects

Versions

Collecting environment information... PyTorch version: 2.5.0.dev20240716 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux trixie/sid (aarch64) GCC version: (Debian 13.3.0-2) 13.3.0 Clang version: 16.0.6 (27+b1) CMake version: version 3.30.0 Libc version: glibc-2.38

Python version: 3.11.2 (main, May 2 2024, 11:59:08) [GCC 12.2.0] (64-bit runtime) Python platform: Linux-6.6.31+rpt-rpi-v8-aarch64-with-glibc2.38 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: ARM Model name: Cortex-A76 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 4 Socket(s): - Cluster(s): 1 Stepping: r4p1 CPU(s) scaling MHz: 100% CPU max MHz: 2400.0000 CPU min MHz: 1500.0000 BogoMIPS: 108.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp L1d cache: 256 KiB (4 instances) L1i cache: 256 KiB (4 instances) L2 cache: 2 MiB (4 instances) L3 cache: 2 MiB (1 instance) Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] executorch==0.4.0a0+c757499 [pip3] numpy==1.26.3 [pip3] torch==2.5.0.dev20240716 [pip3] torchao==0.3.1 [pip3] torchaudio==2.4.0.dev20240716 [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20240716 [conda] Could not collect

Jack-Khuu commented 1 month ago

@sunshinesfbay Thanks for testing out the repo

To help us with getting a repro can you share what commands you ran between git clone ... up to scripts/build_native.sh et?

I can loop in specific ExecuTorch folk afterwards

WaelShaikh commented 1 month ago

I'm getting a similar error stack trace in #990 I followed the instructions in the readme for Android and I got this error when running: ./scripts/install_et.sh

sunshinesfbay commented 1 month ago

@sunshinesfbay Thanks for testing out the repo

To help us with getting a repro can you share what commands you ran between git clone ... up to scripts/build_native.sh et?

I can loop in specific ExecuTorch folk afterwards

Here are the commands

   92  git clone https://github.com/pytorch/torchchat.git
   93  /home/sunshine/pt2/bin/activate
   94  less /home/sunshine/pt2/bin/activate
   95  spurce /home/sunshine/pt2/bin/activate # ooof typo
   96  source /home/sunshine/pt2/bin/activate
   97  ./install_requirements.sh
   98  python3 torchchat.py --help
   99  python3 torchchat.py download stories110
  100  python3 torchchat.py list
  101  python3 torchchat.py download stories110m
  102  python3 torchchat.py generate stories110m --prompt "write me a story about a boy and his bear"
[...] unrelated
  116  streamlit run torchchat.py -- browser llama3 # did not work ?!
  117  streamlit run torchchat.py -- browser llama3 # did not work ?!
  118  python3 torchchat.py export --pte-path s110.pte --quantize config/data/mobile.json  stories110m 
  119  ls -al
  120  python3 torchchat.py export --output-pte-path s110.pte --quantize config/data/mobile.json  stories110m 
  121  ls
  122  scripts/build_native.sh et
  123  vi /home/sunshine/torchchat/tokenizer/base64.h
  124  scripts/build_native.sh et
  125  vi /home/sunshine/torchchat/et-build/src/executorch/build/extract_sources.py
  126  scripts/build_native.sh et
Jack-Khuu commented 1 month ago

The erorr isn't reproing in my side, so we'll need to keep digging

2 Comments: though they may not be directly tied to the build_native.sh error

1) Did this command succeed? You should have received an ET EXPORT EXCEPTION error since none of the listed commands installed ExecuTorch

python3 torchchat.py export --output-pte-path s110.pte --quantize config/data/mobile.json stories110m

2) If you're interested in debugging the browser, feel free to spin up another issue with the error message from this

streamlit run torchchat.py -- browser llama3

sunshinesfbay commented 1 month ago

The erorr isn't reproing in my side, so we'll need to keep digging

Just to be sure, you did try on a Raspberry Pi5 with Raspbian, not another system? This is trying to run everything natively on the Raspberry Pi -- pytorch, executorch, torchchat.

Here's the Raspbian version I used:

$ uname -a
Linux raspberrypi 6.6.31+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux

2 Comments: though they may not be directly tied to the build_native.sh error

  1. Did this command succeed? You should have received an ET EXPORT EXCEPTION error since none of the listed commands installed ExecuTorch

    python3 torchchat.py export --output-pte-path s110.pte --quantize config/data/mobile.json stories110m

Yes, it did -- I was able to build executorch separately, before attempting to install the runner app.

Here are the installation details:

sunshine@raspberrypi:~ $ ls -ald ~/executorch
drwxr-xr-x 28 sunshine sunshine 4096 Jul 31 12:45 /home/sunshine/executorch
sunshine@raspberrypi:~ $ cd executorch
sunshine@raspberrypi:~/executorch $ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index e780016f4..a0e6c42b3 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -143,13 +143,13 @@ endif()
 option(OPTIMIZE_SIZE "Build executorch runtime optimizing for binary size" OFF)
 if(OPTIMIZE_SIZE)
   # -Os: Optimize for size
-  set(CMAKE_CXX_FLAGS_RELEASE "-Os ${CMAKE_CXX_FLAGS_RELEASE}")
+  set(CMAKE_CXX_FLAGS_RELEASE "-v -Os ${CMAKE_CXX_FLAGS_RELEASE}")
 else()
   # -O2: Moderate opt.
-  set(CMAKE_CXX_FLAGS_RELEASE "-O2 ${CMAKE_CXX_FLAGS_RELEASE}")
+  set(CMAKE_CXX_FLAGS_RELEASE "-v -O2 ${CMAKE_CXX_FLAGS_RELEASE}")
 endif()

-set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g")
+set(CMAKE_CXX_FLAGS_DEBUG "-v -O0 -g")

 option(EXECUTORCH_BUILD_ANDROID_JNI "Build Android JNI" OFF)

diff --git a/runtime/core/portable_type/half.h b/runtime/core/portable_type/half.h
index 5aded6827..be5f87a4f 100644
--- a/runtime/core/portable_type/half.h
+++ b/runtime/core/portable_type/half.h
@@ -22,6 +22,8 @@
 #endif // __aarch64__
 #endif // GNUC or clang

+#define NATIVE_FP16 1
+
 #if defined(__GNUC__) || defined(__clang__)
 #if defined(__x86_64__) || defined(_M_X64) || defined(__i386) || \
     defined(_M_IX86)

This was a bit of fun, because apparently on the Raspberry Pi with Raspbian, _Float16 worked GCC but not G++. I don't recall how I got over that, possibly by using clang++.

In any event, I figured I should follow your install instructions because the README explicitly called for running this script, which did try to install a second version of executorch.

  1. If you're interested in debugging the browser, feel free to spin up another issue with the error message from this

    streamlit run torchchat.py -- browser llama3

Done! #1001

sunshinesfbay commented 1 month ago

One more issue during build:

-- Configuring done (0.5s)
-- Generating done (1.0s)
-- Build files have been written to: /home/sunshine/torchchat/cmake-out
+ cmake --build ./cmake-out --target et_run
[9/208] Building CXX object tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o
FAILED: tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o 
/usr/bin/c++  -I/home/sunshine/torchchat/tokenizer -I/home/sunshine/torchchat/tokenizer/third-party/sentencepiece/src -I/home/sunshine/torchchat/tokenizer/third-party/re2 -I/home/sunshine/torchchat/tokenizer/third-party/abseil-cpp -D_GLIBCXX_USE_CXX11_ABI=1 -MD -MT tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o -MF tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o.d -o tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o -c /home/sunshine/torchchat/tokenizer/tiktoken.cpp
In file included from /home/sunshine/torchchat/tokenizer/tiktoken.cpp:18:
/home/sunshine/torchchat/tokenizer/base64.h:37:11: error: 'uint32_t' does not name a type
   37 | constexpr uint32_t DECODE_TABLE[] = {
      |           ^~~~~~~~
/home/sunshine/torchchat/tokenizer/base64.h:29:1: note: 'uint32_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
   28 | #include <string>
  +++ |+#include <cstdint>
   29 | #include <string_view>
/home/sunshine/torchchat/tokenizer/base64.h:57:13: error: variable or field 'validate' declared void
   57 | inline void validate(uint32_t v) {
      |             ^~~~~~~~
/home/sunshine/torchchat/tokenizer/base64.h:57:22: error: 'uint32_t' was not declared in this scope
   57 | inline void validate(uint32_t v) {
      |                      ^~~~~~~~
/home/sunshine/torchchat/tokenizer/base64.h:57:22: note: 'uint32_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
/home/sunshine/torchchat/tokenizer/base64.h: In function 'void base64::detail::decode(const std::string_view&, std::string&)':
/home/sunshine/torchchat/tokenizer/base64.h:70:3: error: 'uint32_t' was not declared in this scope
   70 |   uint32_t val = 0;
      |   ^~~~~~~~
/home/sunshine/torchchat/tokenizer/base64.h:70:3: note: 'uint32_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
/home/sunshine/torchchat/tokenizer/base64.h:72:3: error: 'uint8_t' was not declared in this scope
   72 |   uint8_t c = input[0];
      |   ^~~~~~~
/home/sunshine/torchchat/tokenizer/base64.h:72:3: note: 'uint8_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
sunshinesfbay commented 1 month ago

Created a new user id and did the full install again. _Float16 now works with G++ now (I had submitted a bug report to G++ package on that), running into the uint32_t issue again from above. This is also a problem reported in the "AOTI/DSO" issue with llama3.1.

In case you're curious, I'm attaching an almost full log (I only thought of redirecting the output after I had started the build, so it may miss a command or two at the beginning of the log)

build_native_et.log

Here's the full history from creation of account

    1  vi ~/.profile 
    2  source ~/.profile 
    3  vi ~/.bashrc 
    4  # get the code
    5  git clone https://github.com/pytorch/torchchat.git
    6  cd torchchat
    7  # set up a virtual environment
    8  python3 -m venv .venv
    9  source .venv/bin/activate
   10  # install dependencies
   11  ./install_requirements.sh
   12  exit
   13  # get the code
   14  git clone https://github.com/pytorch/torchchat.git
   15  cd torchchat
   16  # set up a virtual environment
   17  python3 -m venv .venv
   18  source .venv/bin/activate
   19  # install dependencies
   20  ./install_requirements.sh
   21  python3 torchchat.py download stories110m
   22  streamlit run torchchat.py -- browser stories110m
   23  ./scripts/build_native.sh et
   24  ./scripts/build_native.sh et |& tee /tmp/build_native_et.log