Error building with Cuda

Issue description

Can't build executable with Cuda support enabled

Expected Behavior

Running npx node-llama-cpp download --cuda should compile the release with cuda support

Actual Behavior

When running Version 2.5.0 of the Package the build process seems to fail, but the nvidia Toolkit can be found:

D:\Projekte\ai-test\node_modules\node-llama-cpp\llama\build\llama.cpp>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"  --use-local-env -ccbin "D:\Software\Microsoft Visual Studio\2022\VC\Tools\MSVC\14.37.32822\bin\HostX64\x64" -x cu
  -I"D:\Projekte\ai-test\node_modules\node-addon-api" -I"C:\Users\tillw\.cmake-js\node-x64\v18.16.0\include\node" -I"D:\Projekte\ai-test\node_modules\node-llama-cpp\llama\llama.cpp\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include" -I"C:\Pr
  ogram Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include"     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[comput
  e_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] /EHsc -Xcompiler="/EHsc -Ob2"   -D_WINDOWS -DNDEBUG -DNAPI_VERSION=7 -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEE
  R_MAX_BATCH_SIZE=128 -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DNAPI_VERSION=7 -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DG
  GML_CUDA_PEER_MAX_BATCH_SIZE=128 -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Xcompiler "/EHsc /W3 /nologo /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\ggml.pdb" -o ggml.dir\Release\ggml-cuda.obj "D:\Projekte\ai-test\node_
  modules\node-llama-cpp\llama\llama.cpp\ggml-cuda.cu"
  nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified

When trying to build with Cuda support on the current version, CMake seems unable to find the CUDA toolkit:

D:\Projekte\ai-test\node_modules\node-llama-cpp\llama\build\CMakeFiles\3.26.4\VCTargetsPath.vcxproj" (default target) (1) -
    (PrepareForBuild target) ->
      D:\Software\Microsoft Visual Studio\2022\MSBuild\Microsoft\VC\v170\Microsoft.CppBuild.targets(456,5): error MSB8020: The build tools for C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 (Platform Toolset = 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2') cannot be found. To build using the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 build tools, please install C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 build tools.  Alternatively, you may upgrade to the current Visual Studio tools by selecting the Project menu or right-click the solution, and then selecting "Retarget solution". [D:\Projekte\ai-test\node_modules\node-llama-cpp\llama\build\CMakeFiles\3.26.4\VCTargetsPath.vcxproj]

When trying to build LLama.cpp itself with Cuda support it works without any problem.

Steps to reproduce

initialize a new node project by running npm init
add the dependancy by running npm install node-llama-cpp
try to compile with cuda support npx node-llama-cpp download --cuda

My Environment

Dependency	Version
Operating System	Windows 11
CPU	Ryzen 7 7800x3d
Node.js version	18.6
`node-llama-cpp` version	b1378

Additional Context

No response

Relevant Features Used

[ ] Metal support
[X] CUDA support
[ ] Grammar

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

withcatai / node-llama-cpp