tests/anyscale/json_constrained_decoding/test_e2e.py::test_json_mode[False-v1] INFO 06-17 03:59:58 llm_engine.py:162] Initializing an LLM engine (v0.5.0) with config: model='mistralai/Mistral-7B-Instruct-v0.1', speculative_config=None, tokenizer='mistralai/Mistral-7B-Instruct-v0.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=mistralai/Mistral-7B-Instruct-v0.1)
INFO 06-17 03:59:59 selector.py:138] Cannot use FlashAttention-2 backend due to sliding window.
INFO 06-17 03:59:59 selector.py:50] Using XFormers backend.
INFO 06-17 04:00:01 selector.py:138] Cannot use FlashAttention-2 backend due to sliding window.
INFO 06-17 04:00:01 selector.py:50] Using XFormers backend.
INFO 06-17 04:00:02 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 06-17 04:00:05 model_runner.py:160] Loading model weights took 13.4966 GB
INFO 06-17 04:00:05 json_mode_manager.py:138] Use json mode v2: False
2024-06-17 04:00:05,240 INFO worker.py:1585 -- Connecting to existing Ray cluster at address: 10.0.8.107:6379...
2024-06-17 04:00:05,247 INFO worker.py:1761 -- Connected to Ray cluster. View the dashboard at https://session-796bgh5cg3axxvt8zd4veclfsy.i.anyscaleuserdata.com/
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffa28fe23cef530f3eb571e2d404000000 Worker ID: 4376d2b0a8ca2e1b1be881c7db31d65891f164cad02bd87f9aa1411b Node ID: 09b15f6514ebb7fba19d9abf4453806b0c4b83005414e44cae05c6a6 Worker IP address: 10.0.8.107 Worker port: 10058 Worker PID: 94430 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) *** SIGSEGV received at time=1718622005 on cpu 36 ***
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) PC: @ 0x7459b2b7c5c4 (unknown) boost::fibers::algo::round_robin::pick_next()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) @ 0x7459b4445420 1472 (unknown)
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) @ 0x7459b2b7c478 48 boost::fibers::wait_queue::suspend_and_wait()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) @ 0x7459b2b7b865 64 boost::fibers::mutex::lock()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) @ 0x7459b2b084e0 96 std::_Function_handler<>::_M_invoke()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) @ 0x7459b2b00b35 96 boost::fibers::worker_context<>::run_()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) @ 0x7459b2b008b0 80 boost::context::detail::fiber_entry<>()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) @ 0x7459b2b7c99f (unknown) make_fcontext
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: *** SIGSEGV received at time=1718622005 on cpu 36 ***
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: PC: @ 0x7459b2b7c5c4 (unknown) boost::fibers::algo::round_robin::pick_next()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: @ 0x7459b4445420 1472 (unknown)
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: @ 0x7459b2b7c478 48 boost::fibers::wait_queue::suspend_and_wait()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: @ 0x7459b2b7b865 64 boost::fibers::mutex::lock()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: @ 0x7459b2b084e0 96 std::_Function_handler<>::_M_invoke()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: @ 0x7459b2b00b35 96 boost::fibers::worker_context<>::run_()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: @ 0x7459b2b008b0 80 boost::context::detail::fiber_entry<>()
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) [2024-06-17 04:00:05,829 E 94430 94485] logging.cc:440: @ 0x7459b2b7c99f (unknown) make_fcontext
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430) Fatal Python error: Segmentation fault
(<asyncio.locks.Event object at 0x7459afe48fa0 [unset]> pid=94430)
The problem is that this stack size is too small. As we upgrade deps, esp. Python pkgs, the fiber thread eats up the whole stack space and causes stack overflow.
What happened + What you expected to happen
Versions / Dependencies
a11312b8a9b7a95be5e01cf8dee5cc50022acc6d
Reproduction script
N/A
Issue Severity
None