ChatGLM-MNN的cmake选项
CUDA support enabled for cli demo
CUDA support enabled for web demo
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yuyi2439/ChatGLM-MNN/build
-- Cache values
// Build for android whith mini memory mode.
BUILD_FOR_ANDROID:BOOL=OFF
// Build whith mini memory mode.
BUILD_MINI_MEM_MODE:BOOL=on
// No help, variable specified on the command line.
CMAKE_BUILD_TYPE:STRING=Release
// Install path prefix, prepended onto install directories.
CMAKE_INSTALL_PREFIX:PATH=/usr/local
// Host side compiler used by NVCC
CUDA_HOST_COMPILER:FILEPATH=/usr/bin/gcc
// Path to a file.
CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND
// Toolkit location.
CUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda
// Use the static version of the CUDA runtime library if available
CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON
// Path to a library.
CUDA_rt_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/librt.a
// Enable CUDA support
WITH_CUDA:BOOL=on
MNN的cmake选项
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- System: Linux
-- Processor: x86_64
-- Version: 2.5.1
-- Metal: OFF
-- OpenCL: OFF
-- OpenGL: OFF
-- Vulkan: OFF
-- ARM82: OFF
-- oneDNN: OFF
-- TensorRT: OFF
-- CoreML: OFF
-- NNAPI: OFF
-- CUDA: ON
-- OpenMP: OFF
-- BF16: OFF
-- ThreadPool: ON
-- Hidden: TRUE
-- Build Path: /home/yuyi2439/MNN/build
-- CUDA PROFILE: OFF
-- WIN_USE_ASM:
-- x86_64: Open SSE
-- MNN_AVX512:OFF
-- Autodetected CUDA architecture(s): 7.5
-- Enabling CUDA support (version: 12.1, archs: sm_75)
-- message -D_FORCE_INLINES -Wno-deprecated-gpu-targets -w -O3 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 !!!!!!!!!!! /usr/local/cuda/include
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yuyi2439/MNN/build
-- Cache values
// Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel ...
CMAKE_BUILD_TYPE:STRING=Release
// Install path prefix, prepended onto install directories.
CMAKE_INSTALL_PREFIX:PATH=/usr/local
// Host side compiler used by NVCC
CUDA_HOST_COMPILER:FILEPATH=/usr/bin/cc
// Path to a file.
CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND
// Toolkit location.
CUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda
// Use the static version of the CUDA runtime library if available
CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON
// Path to a library.
CUDA_rt_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/librt.a
// Build MNN.framework instead of traditional .a/.dylib
MNN_AAPL_FMWK:BOOL=OFF
// Enable ARM82
MNN_ARM82:BOOL=OFF
// Enable AVX512
MNN_AVX512:BOOL=OFF
// Enable AVX512 VNNI
MNN_AVX512_VNNI:BOOL=ON
// Build benchmark or not
MNN_BUILD_BENCHMARK:BOOL=OFF
// Build with codegen
MNN_BUILD_CODEGEN:BOOL=OFF
// Build Converter
MNN_BUILD_CONVERTER:BOOL=OFF
// Build demo/exec or not
MNN_BUILD_DEMO:BOOL=OFF
// Build from command
MNN_BUILD_FOR_ANDROID_COMMAND:BOOL=OFF
// Build -mfloat-abi=hard or not
MNN_BUILD_HARD:BOOL=OFF
// Build MNN-MINI that just supports fixed shape models.
MNN_BUILD_MINI:BOOL=OFF
// Build OpenCV api in MNN.
MNN_BUILD_OPENCV:BOOL=OFF
// Build with protobuffer in MNN
MNN_BUILD_PROTOBUFFER:BOOL=ON
// Build Quantized Tools or not
MNN_BUILD_QUANTOOLS:BOOL=OFF
// MNN build shared or static lib
MNN_BUILD_SHARED_LIBS:BOOL=ON
// Build tests or not
MNN_BUILD_TEST:BOOL=OFF
// Build tools/cpp or not
MNN_BUILD_TOOLS:BOOL=ON
// Build MNN's training framework
MNN_BUILD_TRAIN:BOOL=OFF
// Enable CoreML
MNN_COREML:BOOL=OFF
// Enable CUDA
MNN_CUDA:BOOL=ON
// Enable CUDA profile
MNN_CUDA_PROFILE:BOOL=OFF
// Enable MNN CUDA Quant File
MNN_CUDA_QUANT:BOOL=OFF
// MNN Debug Memory Access
MNN_DEBUG_MEMORY:BOOL=OFF
// Enable Tensor Size
MNN_DEBUG_TENSOR_SIZE:BOOL=OFF
// Build with coverage enable
MNN_ENABLE_COVERAGE:BOOL=OFF
// Build Evaluation Tools or not
MNN_EVALUATION:BOOL=OFF
// Support profile Expr's op cost
MNN_EXPR_ENABLE_PROFILER:BOOL=OFF
// Force compute Expr's shape directly cost
MNN_EXPR_SHAPE_EAGER:BOOL=OFF
// Disable Multi Thread
MNN_FORBID_MULTI_THREAD:BOOL=OFF
// Enable MNN Gpu Debug
MNN_GPU_TRACE:BOOL=OFF
// Build with MNN internal features, such as model authentication, metrics logging
MNN_INTERNAL:BOOL=OFF
// Build MNN Jni for java to use
MNN_JNI:BOOL=OFF
// Enable Metal
MNN_METAL:BOOL=OFF
// Enable NNAPI
MNN_NNAPI:BOOL=OFF
// Enable oneDNN
MNN_ONEDNN:BOOL=OFF
// Enable OpenCL
MNN_OPENCL:BOOL=OFF
// Enable OpenGL
MNN_OPENGL:BOOL=OFF
// Use OpenMP's thread pool implementation. Does not work on iOS or Mac OS
MNN_OPENMP:BOOL=OFF
// Link the static version of third party libraries where possible to improve the portability of built executables
MNN_PORTABLE_BUILD:BOOL=OFF
// Build MNN Backends and expression separately. Only works with MNN_BUILD_SHARED_LIBS=ON
MNN_SEP_BUILD:BOOL=ON
// Use fp16 instead of bf16 for x86op
MNN_SSE_USE_FP16_INSTEAD:BOOL=OFF
// Enable MNN's bf16 op
MNN_SUPPORT_BF16:BOOL=OFF
// Enable MNN's tflite quantized op
MNN_SUPPORT_DEPRECATED_OP:BOOL=ON
// Enable TensorRT
MNN_TENSORRT:BOOL=OFF
// Enable MNN use c++11
MNN_USE_CPP11:BOOL=ON
// Use Logcat intead of print for info
MNN_USE_LOGCAT:BOOL=ON
// Use SSE optimization for x86 if possiable
MNN_USE_SSE:BOOL=ON
// For opencl and vulkan, use system lib or use dlopen
MNN_USE_SYSTEM_LIB:BOOL=OFF
// Use MNN's own thread pool implementation
MNN_USE_THREAD_POOL:BOOL=ON
// Enable Vulkan
MNN_VULKAN:BOOL=OFF
// MNN use /MT on Windows dll
MNN_WIN_RUNTIME_MT:BOOL=OFF
// Build with plugin op support.
MNN_WITH_PLUGIN:BOOL=OFF
// Native Include Path
NATIVE_INCLUDE_OUTPUT:BOOL=OFF
// Native Library Path
NATIVE_LIBRARY_OUTPUT:BOOL=OFF
程序运行时nvidia-smi的显示
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50 Driver Version: 531.79 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce MX450 On | 00000000:01:00.0 Off | N/A |
| N/A 56C P8 N/A / N/A| 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 22 G /Xwayland N/A |
| 0 N/A N/A 4298 C /web_demo N/A |
+---------------------------------------------------------------------------------------+
环境:WSL2-Ubuntu
ChatGLM-MNN的cmake选项
CUDA support enabled for cli demo CUDA support enabled for web demo -- Configuring done -- Generating done -- Build files have been written to: /home/yuyi2439/ChatGLM-MNN/build -- Cache values // Build for android whith mini memory mode. BUILD_FOR_ANDROID:BOOL=OFF // Build whith mini memory mode. BUILD_MINI_MEM_MODE:BOOL=on // No help, variable specified on the command line. CMAKE_BUILD_TYPE:STRING=Release // Install path prefix, prepended onto install directories. CMAKE_INSTALL_PREFIX:PATH=/usr/local // Host side compiler used by NVCC CUDA_HOST_COMPILER:FILEPATH=/usr/bin/gcc // Path to a file. CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND // Toolkit location. CUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda // Use the static version of the CUDA runtime library if available CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON // Path to a library. CUDA_rt_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/librt.a // Enable CUDA support WITH_CUDA:BOOL=onMNN的cmake选项
-- Use Threadpool, forbid openmp -- >>>>>>>>>>>>> -- MNN BUILD INFO: -- System: Linux -- Processor: x86_64 -- Version: 2.5.1 -- Metal: OFF -- OpenCL: OFF -- OpenGL: OFF -- Vulkan: OFF -- ARM82: OFF -- oneDNN: OFF -- TensorRT: OFF -- CoreML: OFF -- NNAPI: OFF -- CUDA: ON -- OpenMP: OFF -- BF16: OFF -- ThreadPool: ON -- Hidden: TRUE -- Build Path: /home/yuyi2439/MNN/build -- CUDA PROFILE: OFF -- WIN_USE_ASM: -- x86_64: Open SSE -- MNN_AVX512:OFF -- Autodetected CUDA architecture(s): 7.5 -- Enabling CUDA support (version: 12.1, archs: sm_75) -- message -D_FORCE_INLINES -Wno-deprecated-gpu-targets -w -O3 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 !!!!!!!!!!! /usr/local/cuda/include -- Configuring done -- Generating done -- Build files have been written to: /home/yuyi2439/MNN/build -- Cache values // Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel ... CMAKE_BUILD_TYPE:STRING=Release // Install path prefix, prepended onto install directories. CMAKE_INSTALL_PREFIX:PATH=/usr/local // Host side compiler used by NVCC CUDA_HOST_COMPILER:FILEPATH=/usr/bin/cc // Path to a file. CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND // Toolkit location. CUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda // Use the static version of the CUDA runtime library if available CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON // Path to a library. CUDA_rt_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/librt.a // Build MNN.framework instead of traditional .a/.dylib MNN_AAPL_FMWK:BOOL=OFF // Enable ARM82 MNN_ARM82:BOOL=OFF // Enable AVX512 MNN_AVX512:BOOL=OFF // Enable AVX512 VNNI MNN_AVX512_VNNI:BOOL=ON // Build benchmark or not MNN_BUILD_BENCHMARK:BOOL=OFF // Build with codegen MNN_BUILD_CODEGEN:BOOL=OFF // Build Converter MNN_BUILD_CONVERTER:BOOL=OFF // Build demo/exec or not MNN_BUILD_DEMO:BOOL=OFF // Build from command MNN_BUILD_FOR_ANDROID_COMMAND:BOOL=OFF // Build -mfloat-abi=hard or not MNN_BUILD_HARD:BOOL=OFF // Build MNN-MINI that just supports fixed shape models. MNN_BUILD_MINI:BOOL=OFF // Build OpenCV api in MNN. MNN_BUILD_OPENCV:BOOL=OFF // Build with protobuffer in MNN MNN_BUILD_PROTOBUFFER:BOOL=ON // Build Quantized Tools or not MNN_BUILD_QUANTOOLS:BOOL=OFF // MNN build shared or static lib MNN_BUILD_SHARED_LIBS:BOOL=ON // Build tests or not MNN_BUILD_TEST:BOOL=OFF // Build tools/cpp or not MNN_BUILD_TOOLS:BOOL=ON // Build MNN's training framework MNN_BUILD_TRAIN:BOOL=OFF // Enable CoreML MNN_COREML:BOOL=OFF // Enable CUDA MNN_CUDA:BOOL=ON // Enable CUDA profile MNN_CUDA_PROFILE:BOOL=OFF // Enable MNN CUDA Quant File MNN_CUDA_QUANT:BOOL=OFF // MNN Debug Memory Access MNN_DEBUG_MEMORY:BOOL=OFF // Enable Tensor Size MNN_DEBUG_TENSOR_SIZE:BOOL=OFF // Build with coverage enable MNN_ENABLE_COVERAGE:BOOL=OFF // Build Evaluation Tools or not MNN_EVALUATION:BOOL=OFF // Support profile Expr's op cost MNN_EXPR_ENABLE_PROFILER:BOOL=OFF // Force compute Expr's shape directly cost MNN_EXPR_SHAPE_EAGER:BOOL=OFF // Disable Multi Thread MNN_FORBID_MULTI_THREAD:BOOL=OFF // Enable MNN Gpu Debug MNN_GPU_TRACE:BOOL=OFF // Build with MNN internal features, such as model authentication, metrics logging MNN_INTERNAL:BOOL=OFF // Build MNN Jni for java to use MNN_JNI:BOOL=OFF // Enable Metal MNN_METAL:BOOL=OFF // Enable NNAPI MNN_NNAPI:BOOL=OFF // Enable oneDNN MNN_ONEDNN:BOOL=OFF // Enable OpenCL MNN_OPENCL:BOOL=OFF // Enable OpenGL MNN_OPENGL:BOOL=OFF // Use OpenMP's thread pool implementation. Does not work on iOS or Mac OS MNN_OPENMP:BOOL=OFF // Link the static version of third party libraries where possible to improve the portability of built executables MNN_PORTABLE_BUILD:BOOL=OFF // Build MNN Backends and expression separately. Only works with MNN_BUILD_SHARED_LIBS=ON MNN_SEP_BUILD:BOOL=ON // Use fp16 instead of bf16 for x86op MNN_SSE_USE_FP16_INSTEAD:BOOL=OFF // Enable MNN's bf16 op MNN_SUPPORT_BF16:BOOL=OFF // Enable MNN's tflite quantized op MNN_SUPPORT_DEPRECATED_OP:BOOL=ON // Enable TensorRT MNN_TENSORRT:BOOL=OFF // Enable MNN use c++11 MNN_USE_CPP11:BOOL=ON // Use Logcat intead of print for info MNN_USE_LOGCAT:BOOL=ON // Use SSE optimization for x86 if possiable MNN_USE_SSE:BOOL=ON // For opencl and vulkan, use system lib or use dlopen MNN_USE_SYSTEM_LIB:BOOL=OFF // Use MNN's own thread pool implementation MNN_USE_THREAD_POOL:BOOL=ON // Enable Vulkan MNN_VULKAN:BOOL=OFF // MNN use /MT on Windows dll MNN_WIN_RUNTIME_MT:BOOL=OFF // Build with plugin op support. MNN_WITH_PLUGIN:BOOL=OFF // Native Include Path NATIVE_INCLUDE_OUTPUT:BOOL=OFF // Native Library Path NATIVE_LIBRARY_OUTPUT:BOOL=OFF程序运行时nvidia-smi的显示
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.50 Driver Version: 531.79 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce MX450 On | 00000000:01:00.0 Off | N/A | | N/A 56C P8 N/A / N/A| 0MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 22 G /Xwayland N/A | | 0 N/A N/A 4298 C /web_demo N/A | +---------------------------------------------------------------------------------------+目前的速度大约为30s/word,实在是太慢了
用pytorch试了下,cuda是能用的