windows installation error linking local_product_cuda.cu

lm-b commented 4 years ago

I've been trying to install on windows using pip and it looks like I'm almost there. I get through compiling everything and then I get an error when trying to complete linking of local-product-cuda.

System: Win 10, cuda 10.2.89 , pytorch 1.6, python 3.8

traceback: local_product_cuda.cu C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\user\Anaconda3\envs\testenv\lib\site-packages\torch\lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib/x64" /LIBPATH:C:\Users\user\Anaconda3\envs\testenv\libs /LIBPATH:C:\Users\user\Anaconda3\envs\testenv\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib torch_cuda.lib /EXPORT:PyInit_local_product_cuda C:\Users\user\AppData\Local\Temp\pip-install-x6t631um\pytorch-fast-transformers\build\temp.win-amd64-3.8\Release\fast_transformers/local_product/local_product_cuda.obj /OUT:build\lib.win-amd64-3.8\fast_transformers\local_product\local_product_cuda.cp38-win_amd64.pyd /IMPLIB:C:\Users\user\AppData\Local\Temp\pip-install-x6t631um\pytorch-fast-transformers\build\temp.win-amd64-3.8\Release\fast_transformers/local_product\local_product_cuda.cp38-win_amd64.lib

Creating library C:\Users\user\AppData\Local\Temp\pip-install-x6t631um\pytorch-fast-transformers\build\temp.win-amd64-3.8\Release\fast_transformers/local_product\local_product_cuda.cp38-win_amd64.lib and object C:\Users\user\AppData\Local\Temp\pip-install-x6t631um\pytorch-fast-transformers\build\temp.win-amd64-3.8\Release\fast_transformers/local_product\local_product_cuda.cp38-win_amd64.exp

*local_product_cuda.obj : error LNK2001: unresolved external symbol "public: long __cdecl at::Tensor::data_ptr(void)const " (??$data_ptr@J@Tensor@at@@QEBAPEAJXZ)**

build\lib.win-amd64-3.8\fast_transformers\local_product\local_product_cuda.cp38- win_amd64.pyd : fatal error LNK1120: 1 unresolved externals

error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe' failed with exit status 1120

I've been investigating for quite a few hours now, but I can't figure out why I'm getting a linking error. From searching the error it seems like it's some issue with the .lib or function definition not being accessible to the .obj, but it seems like both the .lib and .obj are being created, and I'm assuming all definitions are wrapped into the pip bundle if others are able to install. I wanted to post here in case it is an issue with the dependencies somewhere or something getting messed up with windows. Anyone else having this problem or have an idea where to start in solving it?

Thanks!

angeloskath commented 4 years ago

Hi,

I don't have access to a Windows machine so I cannot reproduce your problem, however, one suggestion would be to try building without local_product_cuda just to make sure that the error is specific to that module. If you get another linking error somewhere else chances are that it is a problem with your setup. Otherwise, it might be something that slipped through our tests because it only presents itself on Windows.

Cheers, Angelos

lm-b commented 4 years ago

Thanks for the quick response, @angeloskath! I can't believe I didn't think to do this myself.

If I comment out lines167-173 of setup.py: CUDAExtension( "fast_transformers.local_product.local_product_cuda", sources=[ "fast_transformers/local_product/local_product_cuda.cu" ], extra_compile_args=["-arch=compute_50"] )

and run the install from local files downloaded from this repository (using command python setup.py install from the fast-transformers-master folder) then it completes the build with warnings but no errors. If I uncomment the lines I again get the LNK2001 and subsequent LNK1120 errors. So it does seem like something specific to local_product_cuda. I'll keep digging around to see if I can figure it out.

codeninja commented 4 years ago

What version are your CUDA drivers? It's referencing ``c10.lib torch.lib torch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib but I know in windows you have to install torch specifically targeted to the CUDA driver version.

Is it possible that the lib is pulling in the wrong cuda version?

lm-b commented 4 years ago

I think I've narrowed it down to an issue with the long version of the packed tensor accessor in torch. I apologize for starting the issue here when it wasn't really an issue with your code!

I changed the long accessor to an int, and everything compiled and linked successfully. I'm guessing there's a chance this hurts me later on when I try to run, but for now I'm gonna close the issue.

for anyone else who has this issue, here's how I figured it out: I used the msvc command prompt and

DUMPBIN /SYMBOLS path/to/fast-transformers/build/temp.win/release/fast_transformers/path-to-object.obj > outputfile.txt

to get a list of symbols used in each of the compiled objects in this project (so, eg: path/to/fast-transformers/build/temp.win/release/fast_transformers/local_product/local_product_cuda.obj, then path/to/fast-transformers/build/temp.win/release/fast_transformers/aggregate/aggregate_cuda.obj, so on so forth)

I searched each output file for the undefined symbol (here, data_ptr seemed to be the unique part of the error). I figured out that all the other files in fast-tranformers use either floats or ints and that the long type was the problem.

kjerk commented 3 years ago

I've run into this problem as well and this issue helped. Also having the same error compiling local_product_cuda.cu . Removing the module in setup.py allowed it to compile, and then re-adding it, got the linker error.

Changing typedef torch::PackedTensorAccessor32<long, 1, torch::RestrictPtrTraits> long_accessor; to typedef torch::PackedTensorAccessor32<int, 1, torch::RestrictPtrTraits> long_accessor; and the subsequent references also did solve the compilation issue and allow the package to compile, but that seems really bizarre.

Environment (no anaconda, no VS command prompt): Windows 10 Visual Studio 2019 Python 3.8.1 Pytorch 1.7.1+cu110 CUDA 11.1

Natooz commented 3 years ago

I encountered the same issue, and solved it with the same fix, thank you ! No idea why the compilation failed, I'll report if I encounter any bug with this

Cuda 11.4 Python 3.7.7 PyTorch 1.9.0+cu111

joanroig commented 3 years ago

I have been struggling to install fast-transformers in Windows 10, the solution of @kjerk worked for me. I add a quick guide on how to fix and then install fast-transformers with pip.

Download the source code: https://github.com/idiap/fast-transformers/archive/refs/tags/v0.4.0.zip
Extract and overwrite the contents of this zip in the root of the source folder, it just contains the original folder structure and one single file that needs to be replaced: fast-transformers-0.4.0.zip
Run pip install pointing to the source folder: pip install C:\Users\User\Downloads\fast-transformers-0.4.0

Note: For conda environments, just do the same after activating the target environment.

Exact changes applied to fast-transformers-0.4.0\local_product\local_product_cuda.cu

typedef torch::PackedTensorAccessor32<float, 1, torch::RestrictPtrTraits> long_accessor; to typedef torch::PackedTensorAccessor32<int, 1, torch::RestrictPtrTraits> long_accessor;

key_lengths.packed_accessor32<float, 1, torch::RestrictPtrTraits>(), to key_lengths.packed_accessor32<int, 1, torch::RestrictPtrTraits>(),

idiap / fast-transformers

windows installation error linking local_product_cuda.cu #48

Note: For conda environments, just do the same after activating the target environment.

Exact changes applied to fast-transformers-0.4.0\local_product\local_product_cuda.cu