pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.31k stars 206 forks source link

Buck 2 Error on running ./install_requirements.sh #3502

Closed gochaudhari closed 2 weeks ago

gochaudhari commented 2 weeks ago

Could someone please help me on this error? I am seeing this with v0.2.0 branch s -- executorch: Generating source file list /temp/executorch/pip-out/temp.linux-x86_64-cpython-311/cmake-out/executorch_srcs.cmake Error while generating /temp/executorch/pip-out/temp.linux-x86_64-cpython-311/cmake-out/executorch_srcs.cmake. Exit code: 1 Output:

Error: Traceback (most recent call last): File "temp/executorch/build/buck_util.py", line 26, in run cp: subprocess.CompletedProcess = subprocess.run( ^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['temp/executorch/pip-out/temp.linux-x86_64-cpython-311/cmake-out/buck2-bin/buck2-071372cfde6e9936c62eb92823742392af4a945570df5c5b34d3eed1b03813c3', 'cquery', "inputs(deps('//runtime/executor:program'))"]' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/temp/executorch/build/extract_sources.py", line 218, in main() File "/temp/executorch/build/extract_sources.py", line 203, in main target_to_srcs[name] = sorted(target.get_sources(graph, runner)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/temp/executorch/build/extract_sources.py", line 116, in get_sources sources: set[str] = set(runner.run(["cquery", query])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/temp/executorch/build/buck_util.py", line 31, in run raise RuntimeError(ex.stderr.decode("utf-8")) from ex RuntimeError: Command failed: Error creating cell resolver

Caused by: Expected a HOME directory to be available

CMake Error at build/Utils.cmake:161 (message): executorch: source list generation failed Call Stack (most recent call first): CMakeLists.txt:261 (extract_sources)

-- Configuring incomplete, errors occurred! error: command '/usr/home/.local/bin/cmake' failed with exit code 1 error: subprocess-exited-with-error

× Building wheel for executorch (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: /usr/bin/python3 /usr/home/.local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp6pi9kk4v cwd: /local/mnt/workspace/qsdk_workspaces/executorch Building wheel for executorch (pyproject.toml) ... error ERROR: Failed building wheel for executorch Failed to build executorch ERROR: Could not build wheels for executorch, which is required to install pyproject.toml-based projects gaurchau@hu-gaurchau-lv:~/temp/executorch$

cccclai commented 2 weeks ago

Thank you for trying it out! @dbort any idea for this error?

gochaudhari commented 2 weeks ago

@cccclai, I just checked the environment variable in my python script and it shows empty. When I add this " os.putenv("HOME", "/usr/home")" to my run function my problem is solved.

class Buck2Runner: def init(self, tool_path: str) -> None: self._path = tool_path

def run(self, args: Sequence[str]) -> list[str]:
    """Runs buck2 with the given args and returns its stdout as a sequence of lines."""
    os.putenv("HOME", "/usr/home")
    try:
        cp: subprocess.CompletedProcess = subprocess.run(
            [self._path] + args, capture_output=True, cwd=BUCK_CWD, check=True
        )
        return [line.strip().decode("utf-8") for line in cp.stdout.splitlines()]
    except subprocess.CalledProcessError as ex:
        raise RuntimeError(ex.stderr.decode("utf-8")) from ex

Do you have any suggestions on this?

dbort commented 2 weeks ago

Thanks for reporting this issue. Must've been caused by this logic that I added to make things work in the CI jobs: https://github.com/pytorch/executorch/blob/3a2b2e8a3d5325d99edddc08decfbe9eaae55292/setup.py#L494-L498

The question is why it's happening for you but not for other users, and not for the CI jobs. The buck2 check itself is at https://github.com/facebook/buck2/blob/ad891c4934458a461b5fcf375e2c17302df00c0d/app/buck2_common/src/invocation_roots.rs#L125, which was added in the buck2 2023-08-15 release, which should be part of the 2024-02-15 release that executorch uses (https://github.com/pytorch/executorch/blob/main/.ci/docker/ci_commit_pins/buck2.txt).

To unblock yourself quickly, you could make a local change that removes this logic:

--- a/setup.py
+++ b/setup.py
@@ -490,17 +490,8 @@ class CustomBuild(build):
         if not self.dry_run:
             # Dry run should log the command but not actually run it.
             (Path(cmake_cache_dir) / "CMakeCache.txt").unlink(missing_ok=True)
-        try:
-            # This script is sometimes run as root in docker containers. buck2
-            # doesn't allow running as root unless $HOME is owned by root or
-            # does not exist. So temporarily undefine it while configuring
-            # cmake, which runs buck2 to get some source lists.
-            old_home = os.environ.pop("HOME", None)
-            # Generate the build system files.
-            self.spawn(["cmake", "-S", repo_root, "-B", cmake_cache_dir, *cmake_args])
-        finally:
-            if old_home is not None:
-                os.environ["HOME"] = old_home
+        # Generate the build system files.
+        self.spawn(["cmake", "-S", repo_root, "-B", cmake_cache_dir, *cmake_args])

         # Build the system.
         self.spawn(["cmake", "--build", cmake_cache_dir, *build_args])

The real fix is to stop running the jobs as root (https://github.com/pytorch/test-infra/issues/5091). But I could at least reduce the blast radius of this hack by only removing the HOME definition when running as root. And ultimately we want to remove buck2 from this flow altogether.

dbort commented 2 weeks ago

@gochaudhari if you have a chance, could you try patching in https://github.com/pytorch/executorch/pull/3507 to see if it fixes the problem for you?

gochaudhari commented 2 weeks ago

@dbort I tried it just now. It is working for me. However, I am not running it as a root user.

dbort commented 2 weeks ago

@gochaudhari Great, thanks for checking! Don't worry about running as root, I'm testing that elsewhere. My main goal was to make sure that this fixes your problem.

dbort commented 2 weeks ago

I've merged the fix into main, and it should also be available in the upcoming v0.2.1 patch release. Thank you @gochaudhari for reporting this bug and for helping me validate the fix!

gochaudhari commented 1 week ago

Thank @dbort and @cccclai for the fix.

leigao97 commented 1 week ago

I have the same issue when I submodule executorch and run cmake from the subpath. It works fine if I directly clone executorch and run cmake from there.

dbort commented 1 week ago

I have the same issue when I submodule executorch and run cmake from the subpath. It works fine if I directly clone executorch and run cmake from there.

@leigao97 Does this happen to you in the main branch? Which executorch git hash are you synced to when this happens?

leigao97 commented 1 week ago

I am using the main branch. I think I found the reason. The path or the parent folder should not have the character "-" or maybe other ones like "." as I saw the same problem from #3524. Once I rename the parent folder, buck2 works.

dbort commented 1 week ago

Wow weird, thanks for tracking that down @leigao97 ! Good to know: this sounds like some unusual buck2 behavior.