llvm / torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Other
1.29k stars 475 forks source link

Out of tree build in conda environment / overdependency problem #3104

Open ZzEeKkAa opened 5 months ago

ZzEeKkAa commented 5 months ago

Hello there I'm trying to build out-of-tree torch-mlir in conda environment, but failing to do so because of numerous problems. I had to build locally mlir-python-binding to have MLIRPythonExtension* and MLIRPythonSources* targets populated (https://github.com/conda-forge/mlir-python-bindings-feedstock/issues/19), but then I'm getting this kind of error:

CMake Error at /home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:674 (target_link_libraries):
  Cannot find source file:

    /home/yevhenii/.miniconda3/envs/mlir-dev/src/python/MLIRPythonExtension.Core/MainModule.cpp

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm
  .ccm .cxxm .c++m .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90
  .f95 .f03 .hip .ispc
Call Stack (most recent call first):
  /home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:228 (add_mlir_python_extension)
  /home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:248 (_process_target)
  python/CMakeLists.txt:112 (add_mlir_python_modules)

CMake Error at /home/yevhenii/.miniconda3/envs/mlir-dev/share/cmake-3.29/Modules/FindPython/Support.cmake:4078 (add_library):
  No SOURCES given to target: TorchMLIRPythonModules.extension._mlir.dso
Call Stack (most recent call first):
  /home/yevhenii/.miniconda3/envs/mlir-dev/share/cmake-3.29/Modules/FindPython3.cmake:549 (__Python3_add_library)
  /home/yevhenii/.miniconda3/envs/mlir-dev/lib/python3.12/site-packages/pybind11/share/cmake/pybind11/pybind11NewTools.cmake:239 (python3_add_library)
  /home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:642 (pybind11_add_module)
  /home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:228 (add_mlir_python_extension)
  /home/yevhenii/.miniconda3/envs/mlir-dev/lib/cmake/mlir/AddMLIRPython.cmake:248 (_process_target)
  python/CMakeLists.txt:112 (add_mlir_python_modules)

It looks pretty wired for me, cause for some reason out-of-tree build reference to the source (not header) files of mlir project. Is it problem with overdependency on torch-mlir side, or it is "by-design". If it is by design, could you provide detailed explanation what source file is for in the downstream project (I'm relatively new to llvm)?

stellaraccident commented 5 months ago

I have no idea what the mlir-python-bindings-feedstock in conda is. I wrote much of those Python bindings, and afaik, the MLIR project never converged on releasing them as a real package or anything ("MLIR" is not an entity that means much outside its role as a library to build certain kinds of compilers). Those bindings were written to be used by other projects wishing to create a Python API for their compilers: MLIR compilers are monolithic and generally don't interoperate across binary boundaries (not saying I care for this state, just reporting the news). So torch-mlir is building a standalone Python API under its namespace, using sources from upstream to do so.

The CMake standalone example includes a minimal, standalone Python API setup: https://github.com/llvm/llvm-project/tree/main/mlir/examples/standalone

iirc, there is CMake magic that gives the source sets names, but basically it is all of the files under mlir/lib/Bindings/Python. Those get built into various pybind extensions that form the API. They all depend on the mlir-c headers (only) and link against something that exports those symbols.

It's been a long time since I've looked at any of that, but I think the install rules here (https://github.com/llvm/llvm-project/blob/main/mlir/cmake/modules/AddMLIRPython.cmake#L174) are what is making sure that the sources and corresponding targets are exported at install time (which I would presume is what conda is doing, but I don't really know). If working from an installed LLVM, it would need to install/export those sources and targets. I don't know of anyone else trying to use torch-mlir standalone in this way (a lot of the developers use torch-mlir as a C++ library dep), but I am aware of other projects that consume MLIR's Python API sources in this way.

stellaraccident commented 5 months ago

(I should also say that all of this is open to be improved. The project contributors to these things have not invested here and largely use torch-mlir as a C++ library, as mentioned. But I am certainly not opposed to folks who care about other ways of using it investing to make it work)

ZzEeKkAa commented 5 months ago

Thank you @stellaraccident ! I've resolve issue with missing sources - it was because feedstock build script was removing /src folder. I was able to successfully generate cmake configuration inside conda environment with pre-build out of tree llvm-project. Currently experiencing some issues both with and without stablehlo build: Without stablehlo TorchConversionDialect.h.inc is not generated while Building CXX object lib/RefBackend/CMakeFiles/obj.TorchMLIRRefBackend.dir/RefBackend.cpp.o With stablehlo, there is a problem with build of stablehlo related to llvm install or whatever, but TorchConversionDialect.h.inc was generated.

My question is it possible that there are missing dependencies set between TorchMLIRRefBackend/RefBackend.cpp and TorchConversionDialect.h.inc in cmake config?

stellaraccident commented 5 months ago

My question is it possible that there are missing dependencies set between TorchMLIRRefBackend/RefBackend.cpp and TorchConversionDialect.h.inc in cmake config?

Quite possible. I believe ninja now has a mode where it can check for some things like this, but I don't think anyone has spent time sanitizing the different build modes of this project.

h-vetinari commented 5 months ago

This should be fixed now, but I'm waiting for you to test it before I catch up the mlir-python-bindings feedstock (that I wasn't aware of until a few days ago) to the most current version.