wangzhaode / mnn-llm

llm deploy project based mnn.
Apache License 2.0
1.47k stars 163 forks source link

运行时没有正确使用GPU #74

Closed yuyi2439 closed 9 months ago

yuyi2439 commented 1 year ago

环境:WSL2-Ubuntu

ChatGLM-MNN的cmake选项 CUDA support enabled for cli demo CUDA support enabled for web demo -- Configuring done -- Generating done -- Build files have been written to: /home/yuyi2439/ChatGLM-MNN/build -- Cache values // Build for android whith mini memory mode. BUILD_FOR_ANDROID:BOOL=OFF // Build whith mini memory mode. BUILD_MINI_MEM_MODE:BOOL=on // No help, variable specified on the command line. CMAKE_BUILD_TYPE:STRING=Release // Install path prefix, prepended onto install directories. CMAKE_INSTALL_PREFIX:PATH=/usr/local // Host side compiler used by NVCC CUDA_HOST_COMPILER:FILEPATH=/usr/bin/gcc // Path to a file. CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND // Toolkit location. CUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda // Use the static version of the CUDA runtime library if available CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON // Path to a library. CUDA_rt_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/librt.a // Enable CUDA support WITH_CUDA:BOOL=on
MNN的cmake选项 -- Use Threadpool, forbid openmp -- >>>>>>>>>>>>> -- MNN BUILD INFO: -- System: Linux -- Processor: x86_64 -- Version: 2.5.1 -- Metal: OFF -- OpenCL: OFF -- OpenGL: OFF -- Vulkan: OFF -- ARM82: OFF -- oneDNN: OFF -- TensorRT: OFF -- CoreML: OFF -- NNAPI: OFF -- CUDA: ON -- OpenMP: OFF -- BF16: OFF -- ThreadPool: ON -- Hidden: TRUE -- Build Path: /home/yuyi2439/MNN/build -- CUDA PROFILE: OFF -- WIN_USE_ASM: -- x86_64: Open SSE -- MNN_AVX512:OFF -- Autodetected CUDA architecture(s): 7.5 -- Enabling CUDA support (version: 12.1, archs: sm_75) -- message -D_FORCE_INLINES -Wno-deprecated-gpu-targets -w -O3 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 !!!!!!!!!!! /usr/local/cuda/include -- Configuring done -- Generating done -- Build files have been written to: /home/yuyi2439/MNN/build -- Cache values // Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel ... CMAKE_BUILD_TYPE:STRING=Release // Install path prefix, prepended onto install directories. CMAKE_INSTALL_PREFIX:PATH=/usr/local // Host side compiler used by NVCC CUDA_HOST_COMPILER:FILEPATH=/usr/bin/cc // Path to a file. CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND // Toolkit location. CUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda // Use the static version of the CUDA runtime library if available CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON // Path to a library. CUDA_rt_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/librt.a // Build MNN.framework instead of traditional .a/.dylib MNN_AAPL_FMWK:BOOL=OFF // Enable ARM82 MNN_ARM82:BOOL=OFF // Enable AVX512 MNN_AVX512:BOOL=OFF // Enable AVX512 VNNI MNN_AVX512_VNNI:BOOL=ON // Build benchmark or not MNN_BUILD_BENCHMARK:BOOL=OFF // Build with codegen MNN_BUILD_CODEGEN:BOOL=OFF // Build Converter MNN_BUILD_CONVERTER:BOOL=OFF // Build demo/exec or not MNN_BUILD_DEMO:BOOL=OFF // Build from command MNN_BUILD_FOR_ANDROID_COMMAND:BOOL=OFF // Build -mfloat-abi=hard or not MNN_BUILD_HARD:BOOL=OFF // Build MNN-MINI that just supports fixed shape models. MNN_BUILD_MINI:BOOL=OFF // Build OpenCV api in MNN. MNN_BUILD_OPENCV:BOOL=OFF // Build with protobuffer in MNN MNN_BUILD_PROTOBUFFER:BOOL=ON // Build Quantized Tools or not MNN_BUILD_QUANTOOLS:BOOL=OFF // MNN build shared or static lib MNN_BUILD_SHARED_LIBS:BOOL=ON // Build tests or not MNN_BUILD_TEST:BOOL=OFF // Build tools/cpp or not MNN_BUILD_TOOLS:BOOL=ON // Build MNN's training framework MNN_BUILD_TRAIN:BOOL=OFF // Enable CoreML MNN_COREML:BOOL=OFF // Enable CUDA MNN_CUDA:BOOL=ON // Enable CUDA profile MNN_CUDA_PROFILE:BOOL=OFF // Enable MNN CUDA Quant File MNN_CUDA_QUANT:BOOL=OFF // MNN Debug Memory Access MNN_DEBUG_MEMORY:BOOL=OFF // Enable Tensor Size MNN_DEBUG_TENSOR_SIZE:BOOL=OFF // Build with coverage enable MNN_ENABLE_COVERAGE:BOOL=OFF // Build Evaluation Tools or not MNN_EVALUATION:BOOL=OFF // Support profile Expr's op cost MNN_EXPR_ENABLE_PROFILER:BOOL=OFF // Force compute Expr's shape directly cost MNN_EXPR_SHAPE_EAGER:BOOL=OFF // Disable Multi Thread MNN_FORBID_MULTI_THREAD:BOOL=OFF // Enable MNN Gpu Debug MNN_GPU_TRACE:BOOL=OFF // Build with MNN internal features, such as model authentication, metrics logging MNN_INTERNAL:BOOL=OFF // Build MNN Jni for java to use MNN_JNI:BOOL=OFF // Enable Metal MNN_METAL:BOOL=OFF // Enable NNAPI MNN_NNAPI:BOOL=OFF // Enable oneDNN MNN_ONEDNN:BOOL=OFF // Enable OpenCL MNN_OPENCL:BOOL=OFF // Enable OpenGL MNN_OPENGL:BOOL=OFF // Use OpenMP's thread pool implementation. Does not work on iOS or Mac OS MNN_OPENMP:BOOL=OFF // Link the static version of third party libraries where possible to improve the portability of built executables MNN_PORTABLE_BUILD:BOOL=OFF // Build MNN Backends and expression separately. Only works with MNN_BUILD_SHARED_LIBS=ON MNN_SEP_BUILD:BOOL=ON // Use fp16 instead of bf16 for x86op MNN_SSE_USE_FP16_INSTEAD:BOOL=OFF // Enable MNN's bf16 op MNN_SUPPORT_BF16:BOOL=OFF // Enable MNN's tflite quantized op MNN_SUPPORT_DEPRECATED_OP:BOOL=ON // Enable TensorRT MNN_TENSORRT:BOOL=OFF // Enable MNN use c++11 MNN_USE_CPP11:BOOL=ON // Use Logcat intead of print for info MNN_USE_LOGCAT:BOOL=ON // Use SSE optimization for x86 if possiable MNN_USE_SSE:BOOL=ON // For opencl and vulkan, use system lib or use dlopen MNN_USE_SYSTEM_LIB:BOOL=OFF // Use MNN's own thread pool implementation MNN_USE_THREAD_POOL:BOOL=ON // Enable Vulkan MNN_VULKAN:BOOL=OFF // MNN use /MT on Windows dll MNN_WIN_RUNTIME_MT:BOOL=OFF // Build with plugin op support. MNN_WITH_PLUGIN:BOOL=OFF // Native Include Path NATIVE_INCLUDE_OUTPUT:BOOL=OFF // Native Library Path NATIVE_LIBRARY_OUTPUT:BOOL=OFF
程序运行时nvidia-smi的显示 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.50 Driver Version: 531.79 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce MX450 On | 00000000:01:00.0 Off | N/A | | N/A 56C P8 N/A / N/A| 0MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 22 G /Xwayland N/A | | 0 N/A N/A 4298 C /web_demo N/A | +---------------------------------------------------------------------------------------+

目前的速度大约为30s/word,实在是太慢了

用pytorch试了下,cuda是能用的

yuyi2439 commented 1 year ago

突然想起来没发log,现在没有电脑,我记得显卡支持fp16,显卡cc为7.5,显存2GB,可用显存1.5GB左右

github-actions[bot] commented 9 months ago

Marking as stale. No activity in 30 days.