[BUG] Issues with Numpy v2

Disclaimer: Github Issues are only for code related bugs. If you do not understand how to startup or use TabbyAPI, please ask in the Discord Server

Describe the bug When starting TabbyAPI from the SillyTavern-Launcher with python start.py there appears to be a problem compiling a module the main problem it gives is something with numpy? which turns into more problems when trying to use the model see logs

This is the current script i am using in the launcher to install and run TabbyAPI... i wonder what is going wrong here

TabbyAPI is installed inside the conda env called tabbyapi with python 3.11 it uses the gpu_lib.txt with cu121 since the user is using a NVIDIA card.

To Reproduce Steps to reproduce the behavior:

Run the following script: https://github.com/SillyTavern/SillyTavern-Launcher/blob/c29c6499f88352df3f15ff4be4c5cab23df4b01e/Launcher.bat#L1866

Expected behavior TabbyAPI should install normally without problems.

Logs

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "E:\ai-projects\DEV-SillyTavern\SillyTavern-Launcher\text-completion\tabbyAPI\start.py", line 160, in <module>
    from main import entrypoint
  File "E:\ai-projects\DEV-SillyTavern\SillyTavern-Launcher\text-completion\tabbyAPI\main.py", line 11, in <module>
    from common import config, gen_logging, sampling, model
  File "E:\ai-projects\DEV-SillyTavern\SillyTavern-Launcher\text-completion\tabbyAPI\common\model.py", line 11, in <module>
    from backends.exllamav2.model import ExllamaV2Container
  File "E:\ai-projects\DEV-SillyTavern\SillyTavern-Launcher\text-completion\tabbyAPI\backends\exllamav2\model.py", line 11, in <module>
    from exllamav2 import (
  File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\model.py", line 31, in <module>
    from exllamav2.config import ExLlamaV2Config
  File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\config.py", line 5, in <module>
    from exllamav2.fasttensors import STFile
  File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\fasttensors.py", line 6, in <module>
    from exllamav2.ext import exllamav2_ext as ext_c
  File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\ext.py", line 286, in <module>
    none_tensor = torch.empty((1, 1), device = "meta")
C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\ext.py:286: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.)
  none_tensor = torch.empty((1, 1), device = "meta")

Loading the model was no problem but when trying to talk to a character then the following error appears:

INFO:     Disabling GPU split because one GPU is in use.
WARNING:  The given cache_size (32768) is less than 2 * max_seq_len and may be too small for requests using CFG.
WARNING:  Ignore this warning if you do not plan on using CFG.
INFO:     Attempting to load a prompt template if present.
INFO:     Using template "from_tokenizer_config" for chat completions.
INFO:     Loading model:
E:\ai-projects\DEV-SillyTavern\SillyTavern-Launcher\text-completion\tabbyAPI\models\Hathor-L3-8B-v.01-exl2
INFO:     Loading with a manual GPU split (or a one GPU setup)
Loading model modules ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 67/67 0:00:00
INFO:     Model successfully loaded.
INFO:     127.0.0.1:57832 - "POST /v1/token/encode HTTP/1.1" 200
INFO:     127.0.0.1:57833 - "POST /v1/completions HTTP/1.1" 200
ERROR:    Traceback (most recent call last):
ERROR:      File
"E:\ai-projects\DEV-SillyTavern\SillyTavern-Launcher\text-completion\tabbyAPI\endpoints\OAI\utils\completion.py", line
135, in stream_generate_completion
ERROR:        raise generation
ERROR:      File
"E:\ai-projects\DEV-SillyTavern\SillyTavern-Launcher\text-completion\tabbyAPI\endpoints\OAI\utils\completion.py", line
87, in _stream_collector
ERROR:        async for generation in new_generation:
ERROR:      File
"E:\ai-projects\DEV-SillyTavern\SillyTavern-Launcher\text-completion\tabbyAPI\backends\exllamav2\model.py", line 1100,
in generate_gen
ERROR:        job = ExLlamaV2DynamicJobAsync(
ERROR:              ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\generator\dynamic_async.py", line
75, in __init__
ERROR:        self.generator.enqueue(self)
ERROR:      File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\generator\dynamic_async.py", line
42, in enqueue
ERROR:        self.generator.enqueue(job.job)
ERROR:      File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\generator\dynamic.py", line 791,
in enqueue
ERROR:        job.prepare_for_queue(self, self.job_serial)
ERROR:      File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\generator\dynamic.py", line 2052,
in prepare_for_queue
ERROR:        r_hash = _tensor_hash_checksum(page_ids, r_hash)
ERROR:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "C:\Users\admin\miniconda3\envs\tabbyapi\Lib\site-packages\exllamav2\generator\dynamic.py", line 35,
in _tensor_blake2b_checksum
ERROR:        hasher.update(tensor.numpy().tobytes())
ERROR:                      ^^^^^^^^^^^^^^
ERROR:    RuntimeError: Numpy is not available
ERROR:    Sent to request: Completion aborted. Please check the server console.

System info (Bugs without this information will go lower on our priority list!)

OS: Windows 11 Version 23H2 (OS Build 22631.3737)
Python version: conda python version 3.11
CUDA version: 12.5

Additional context Add any other context about the problem here.

P.S the discord invite is expired that displays in this issue when creating one... also when trying to turn .com into .gg it just says its expired

theroyallab / tabbyAPI

[BUG] Issues with Numpy v2 #140