shader-slang / slang-torch

A Python package for calling Slang modules from PyTorch.
Other
38 stars 8 forks source link

error: invalid use of incomplete type 'const class at::Tensor' #23

Closed min-hieu-netropy closed 18 hours ago

min-hieu-netropy commented 4 days ago

Hi! I'm trying to run the square example in the torch slang and got the following error

 p main.py                                                                                                                      (base)
Loading slang module: square.slang
Using slangc location: /home/charlie/miniconda3/lib/python3.10/site-packages/slangtorch/bin/slangc
Using fast math (--use_fast_math)
Using line info (--generate-line-info)
Dry-run using latest build directory: .slangtorch_cache/square/b9c103f6b206b8e5/0
Version number missing. Needs recompile
Version number missing. Needs recompile
Build required. Creating unique build directory
Working folder: .slangtorch_cache/square/b9c103f6b206b8e5/0
Version number missing. Needs recompile
Building square.slang -> square.cpp:  /home/charlie/miniconda3/lib/python3.10/site-packages/slangtorch/bin/slangc square.slang -target torch-binding -o .slangtorch_cache/square/b9c103f6b206b8e5/square.cpp -depfile .slangtorch_cache/square/b9c103f6b206b8e5/square.cpp.d.out -ignore-capabilities
Version number missing. Needs recompile
Building square.slang -> square_cuda.cu:  /home/charlie/miniconda3/lib/python3.10/site-packages/slangtorch/bin/slangc square.slang -target cuda -o .slangtorch_cache/square/b9c103f6b206b8e5/square_cuda.cu -depfile .slangtorch_cache/square/b9c103f6b206b8e5/square_cuda.cu.d.out -ignore-capabilities
Skipping additional ninja check (WARNING: this may ignore changes to non-slang files)
Detected CUDA files, patching ldflags
Emitting ninja build file /home/charlie/work/playground/slangpy_playground/diff_test/.slangtorch_cache/square/b9c103f6b206b8e5/0/build.ninja...
Building extension module _slangtorch_square_b9c103f6b206b8e5...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF square.o.d -DTORCH_EXTENSION_NAME=_slangtorch_square_b9c103f6b206b8e5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/charlie/work/playground/slangpy_playground/diff_test -isystem /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include -isystem /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/TH -isystem /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-12.4/include -isystem /home/charlie/miniconda3/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -std=c++17 -c /home/charlie/work/playground/slangpy_playground/diff_test/.slangtorch_cache/square/b9c103f6b206b8e5/square.cpp -o square.o
FAILED: square.o
c++ -MMD -MF square.o.d -DTORCH_EXTENSION_NAME=_slangtorch_square_b9c103f6b206b8e5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/charlie/work/playground/slangpy_playground/diff_test -isystem /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include -isystem /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/TH -isystem /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-12.4/include -isystem /home/charlie/miniconda3/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -std=c++17 -c /home/charlie/work/playground/slangpy_playground/diff_test/.slangtorch_cache/square/b9c103f6b206b8e5/square.cpp -o square.o
/home/charlie/work/playground/slangpy_playground/diff_test/.slangtorch_cache/square/b9c103f6b206b8e5/square.cpp:434:9: warning: #pragma once in main file
  434 | #pragma once
      |         ^~~~
/home/charlie/work/playground/slangpy_playground/diff_test/.slangtorch_cache/square/b9c103f6b206b8e5/square.cpp:3686:9: warning: #pragma once in main file
 3686 | #pragma once
      |         ^~~~
In file included from /home/charlie/work/playground/slangpy_playground/diff_test/.slangtorch_cache/square/b9c103f6b206b8e5/square.cpp:4:
/home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAUtils.h: In function ‘bool at::cuda::check_device(c10::ArrayRef<at::Tensor>)’:
/home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAUtils.h:14:26: error: cannot increment a pointer to incomplete type ‘const at::Tensor’
   14 |   for (const Tensor& t : ts) {
      |                          ^~
/home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAUtils.h:15:9: error: invalid use of incomplete type ‘const class at::Tensor’
   15 |     if (t.device() != curDevice) return false;
      |         ^
In file included from /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/c10/core/GeneratorImpl.h:8,
                 from /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/ATen/core/Generator.h:18,
                 from /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/ATen/CPUGeneratorImpl.h:3,
                 from /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/ATen/Context.h:4,
                 from /home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6,
                 from /home/charlie/work/playground/slangpy_playground/diff_test/.slangtorch_cache/square/b9c103f6b206b8e5/square.cpp:3:
/home/charlie/miniconda3/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h:61:7: note: forward declaration of ‘class at::Tensor’
   61 | class Tensor;
      |       ^~~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/charlie/miniconda3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2104, in _run_ninja_build
    subprocess.run(
  File "/home/charlie/miniconda3/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/charlie/work/playground/slangpy_playground/diff_test/main.py", line 4, in <module>
    m = slangtorch.loadModule('square.slang', verbose=True)
  File "/home/charlie/miniconda3/lib/python3.10/site-packages/slangtorch/slangtorch.py", line 685, in loadModule
    rawModule = _loadModule(fileName, moduleName, buildDir, options, sourceDir=outputFolder, verbose=verbose, includePaths=includePaths, dryRun=False, skipNinjaCheck=skipNinjaCheck, extraCudaFlags=extraCudaFlags, extraSlangFlags=extraSlangFlags)
  File "/home/charlie/miniconda3/lib/python3.10/site-packages/slangtorch/slangtorch.py", line 577, in _loadModule
    slangLib, metadata = compileAndLoadModule(
  File "/home/charlie/miniconda3/lib/python3.10/site-packages/slangtorch/slangtorch.py", line 463, in compileAndLoadModule
    slangLib = _compileAndLoadModule(metadata, sources, moduleName, buildDir, slangSourceDir, extraCudaFlags, verbose)
  File "/home/charlie/miniconda3/lib/python3.10/site-packages/slangtorch/slangtorch.py", line 507, in _compileAndLoadModule
    return jit_compile(
  File "/home/charlie/miniconda3/lib/python3.10/site-packages/slangtorch/util/compile.py", line 71, in jit_compile
    _write_ninja_file_and_build_library(
  File "/home/charlie/miniconda3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1833, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/charlie/miniconda3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2120, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_slangtorch_square_b9c103f6b206b8e5'

But I have no issue compiling slang before with slangpy. torch 2.5.1+cu124 running on rtx4090

LouisDeOliveira commented 4 days ago

Hi, I am getting the same error on torch 2.4.1+cu121 on rtx2080.

Jerry-Shen0527 commented 2 days ago

Same problem here. torch 2.5.1+cu124 on rtx4070. I tried to manually add #include <ATen/core/Tensor.h> at the front of my local torch header "site-packages/torch/include/ATen/cuda/CUDAUtils.h" and it successfully runs. I would guess as pytorch updating, this is required at the front of slang-torch-prelude.h.

oliver-batchelor commented 1 day ago

Went to try for first time, same issue here. 2.5.1 and 2.4.0 were the same error.

If I do as above and add Tensor.h to CUAUtils.h I get loads of this error:

/local/slang-torch/examples/inline-mlp-example/.slangtorch_cache/image-model/39a1e66c93673ceb/image-model.cpp:493:9: error: ‘SlangUInt’ does not name a type

LouisDeOliveira commented 1 day ago

I managed to make it work by adding the include directive #include <ATen/core/Tensor.h> as suggested by @Jerry-Shen0527 . For me it works either when added to CUDAUtils or to slang-torch-prelude.

saipraveenb25 commented 1 day ago

This seems to be an issue triggered due to a re-ordering of #include directives in slang-torch-prelude.h when we turned on clang-format for the main slang codebase.

Will get a fix merged & a new PR created as-soon-as-possible. For now, please use pip install slangtorch==1.3.0 to use the previous working release, or use the workaround suggested by @Jerry-Shen0527.

saipraveenb25 commented 21 hours ago

I've created a new release v1.3.2.

Can you check if updating slangtorch fixes the issue?

oliver-batchelor commented 21 hours ago

I've created a new release v1.3.2.

Can you check if updating slangtorch fixes the issue?

Works for me.

The other compiler error I had related to the install in an environment (but also found something workable for now) https://github.com/shader-slang/slang-torch/issues/10

min-hieu-netropy commented 18 hours ago

It worked perfectly! I will close this issue for now.