I tried to create the serving on my system, but failed with the below error:
(emon_analyzer) [root@SPR-1 emon_data_analyzer]# neuralchat_server start --config_file ./config/neuralchat.yaml
2024-03-19 11:38:57,005 - numexpr.utils - INFO - Note: detected 224 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-03-19 11:38:57,005 - numexpr.utils - INFO - Note: NumExpr detected 224 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-03-19 11:38:57,005 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-19 11:38:57,348 - datasets - INFO - PyTorch version 2.2.0+cpu available.
[2024-03-19 11:38:57,430] [ ERROR] - Failed to start server.
[2024-03-19 11:38:57,430] [ ERROR] - partially initialized module 'intel_extension_for_pytorch' has no attribute '_C' (most likely due to a circular import)
device: "cpu"
# support openai/whisper series
model_name_or_path: "openai/whisper-small"
# only can be set to true when the device is set to "cpu"
bf16: false
I tried to create the serving on my system, but failed with the below error: (emon_analyzer) [root@SPR-1 emon_data_analyzer]# neuralchat_server start --config_file ./config/neuralchat.yaml 2024-03-19 11:38:57,005 - numexpr.utils - INFO - Note: detected 224 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. 2024-03-19 11:38:57,005 - numexpr.utils - INFO - Note: NumExpr detected 224 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2024-03-19 11:38:57,005 - numexpr.utils - INFO - NumExpr defaulting to 8 threads. 2024-03-19 11:38:57,348 - datasets - INFO - PyTorch version 2.2.0+cpu available. [2024-03-19 11:38:57,430] [ ERROR] - Failed to start server. [2024-03-19 11:38:57,430] [ ERROR] - partially initialized module 'intel_extension_for_pytorch' has no attribute '_C' (most likely due to a circular import)
yaml config file:
host: 0.0.0.0 port: 8000
model_name_or_path: "Intel/neural-chat-7b-v3-1"
model_name_or_path: "/home/zluo2/TableLlama-model"
tokenizer_name_or_path: ""
peft_model_path: "./models/emon_llama"
device: "cpu"
asr: enable: false args:
support cpu, hpu, xpu, cuda
tts: enable: false args: device: "cpu" voice: "default" stream_mode: false output_audio_path: "./output_audio.wav"
asr_chinese: enable: false
tts_chinese: enable: false args: device: "cpu" spk_id: 0 stream_mode: false output_audio_path: "./output_audio.wav"
retrieval: enable: true args: input_path: "./rag_data/emon-sample"
vector_database: "Qdrant"
safety_checker: enable: false
ner: enable: false args: spacy_model: "en_core_web_lg"
tasks_list: ['textchat', 'retrieval']