microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table
MIT License
568 stars 44 forks source link

OpenAI compatible chat completions endpoint #28

Open maxim-saplin opened 2 months ago

maxim-saplin commented 2 months ago

Would be great to have an easy way to run OpenAI endpoint on localhost and have the ability to interface with the model via HTTP API - e.g. use with any of the chat bot UI options

kaleid-liner commented 2 months ago

You can use server provided by llama.cpp https://github.com/kaleid-liner/llama.cpp/tree/master/examples%2Fserver

maxim-saplin commented 2 months ago

Spent 30 minutes yesterday, dropped my attempts after failing to build it... Would be great to have some instruction

maxim-saplin commented 2 months ago

Btw, recent llama.cpp has different way of interacting with the server, it's first class citizen now (not part of the examples) - llama-server:

Important [2024 Jun 12] Binaries have been renamed w/ a llama- prefix. main is now llama-cli, server is llama-server, etc (https://github.com/ggerganov/llama.cpp/pull/7809)

kaleid-liner commented 2 months ago

After successful running run_pipeline.py, you can just run make server or cmake --build . --target server in 3rdparty/llama.cpp/build

maxim-saplin commented 2 months ago

I get some dependency error

user@PC:/mnt/f/src/T-MAC/3rdparty/llama.cpp/build$ cmake --build . --target server
-- TMAC found
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Error at /mnt/f/src/T-MAC/t-mac-env38/lib/python3.8/site-packages/cmake/data/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
  system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY
  OPENSSL_INCLUDE_DIR)
Call Stack (most recent call first):
  /mnt/f/src/T-MAC/t-mac-env38/lib/python3.8/site-packages/cmake/data/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /mnt/f/src/T-MAC/t-mac-env38/lib/python3.8/site-packages/cmake/data/share/cmake-3.30/Modules/FindOpenSSL.cmake:689 (find_package_handle_standard_args)
  examples/server/CMakeLists.txt:33 (find_package)

-- Configuring incomplete, errors occurred!
gmake: *** [Makefile:1728: cmake_check_build_system] Error 1
kaleid-liner commented 2 months ago

Btw, recent llama.cpp has different way of interacting with the server, it's first class citizen now (not part of the examples) - llama-server:

Important [2024 Jun 12] Binaries have been renamed w/ a llama- prefix. main is now llama-cli, server is llama-server, etc (https://github.com/ggerganov/llama.cpp/pull/7809)

Only after we have merged the latest llama.cpp, see #24

kaleid-liner commented 2 months ago

I get some dependency error

user@PC:/mnt/f/src/T-MAC/3rdparty/llama.cpp/build$ cmake --build . --target server
-- TMAC found
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Error at /mnt/f/src/T-MAC/t-mac-env38/lib/python3.8/site-packages/cmake/data/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
  system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY
  OPENSSL_INCLUDE_DIR)
Call Stack (most recent call first):
  /mnt/f/src/T-MAC/t-mac-env38/lib/python3.8/site-packages/cmake/data/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /mnt/f/src/T-MAC/t-mac-env38/lib/python3.8/site-packages/cmake/data/share/cmake-3.30/Modules/FindOpenSSL.cmake:689 (find_package_handle_standard_args)
  examples/server/CMakeLists.txt:33 (find_package)

-- Configuring incomplete, errors occurred!
gmake: *** [Makefile:1728: cmake_check_build_system] Error 1

I think this is not a issue of T-MAC, and the problem is described straightforward. You need to provide openssl. You can try to solve it by searching in llama.cpp.

maxim-saplin commented 2 months ago

That was like problem number 10 after I spent whole evening building and running T-MAC :) Got exhausted at this point and had no courage to proceed with tinkering... I am no C dev, would be happy to return to T-MAC when someone experienced fixes the server feature and provides clear instructions.

Thx for an interesting project, BTW

kaleid-liner commented 2 months ago

Sorry for the inconvenience. Currently we have no good solution to simplify the build process.