Open prashanthcheemala opened 3 months ago
LLaMA says: Chat 🦙 I'm an open-source chatbot. I can explain concepts, write poems and code, solve logic puzzles, or even name your pets. What do you want to chat about? 🥸 Hi any idea how to fix: Root Cause (first observed failure): [0]: time : 2024-07-25_12:48:53 host : ip-119-181-1-31.ec2.internal rank : 0 (local_rank: 0) exitcode : -9 (pid: 6774) error_file: <N/A> traceback : Signal 9 (SIGKILL) received by PID 6774 🦙 The error message you're seeing typically indicates that your process was terminated by the operating system, usually due to resource constraints such as running out of memory. Here are some steps you can take to try and resolve the issue:
/var/log/syslog
or /var/log/messages
) around the time of the error (2024-07-25_12:48:53) to see if there are any messages indicating why the process was killed.top
, htop
, or atop
to monitor your system's resource usage (CPU, memory, disk, etc.) while running your application. This can help you identify if you're running low on any resources.ip-119-181-1-31.ec2.internal
), you might need to upgrade to a larger instance type with more memory or adjust your application's configuration to use fewer resources.mprof
or line_profiler
to identify memory leaks.
llama inference start /opt/LLama_Agentic_System/llama3_1venv/lib/python3.11/site-packages/llama_toolchain/utils.py:43: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 initialize(config_path=relative_path) Loading config from : /root/.llama/configs/inference.yaml Yaml config:
inference_config: impl_config: impl_type: inline checkpoint_config: checkpoint: checkpoint_type: pytorch checkpoint_dir: /root/.llama/checkpoints/Meta-Llama-3.1-8B-Instruct/original/ tokenizer_path: /root/.llama/checkpoints/Meta-Llama-3.1-8B-Instruct/original/tokenizer.model model_parallel_size: 1 quantization_format: bf16 quantization: null torch_seed: null max_seq_len: 16384 max_batch_size: 1
Listening on :::5000 INFO: Started server process [6765] INFO: Waiting for application startup.