microsoft / vidur

A large-scale simulation framework for LLM inference
MIT License
242 stars 27 forks source link

Error: can't register atexit after shutdown #38

Open rajeshitshoulders opened 5 days ago

rajeshitshoulders commented 5 days ago

Hi, could you please help with resolve below issue for IPython.core.display module

Setup mamba virtual env: /home/idps/vidur/vidur-venv I configured wandb and set to variable WANDB_BASE_URL to local web server with API key.

Please let me know if you need any additional information

(/home/idps/vidur/vidur-venv) idps@smc-gpu-03:~/vidur$ python -m vidur.main --replica_config_device a100 --replica_config_model_name meta-llama/Llama-2-7b-hf --cluster_config_num_replicas 1 --replica_config_tensor_parallel_size 1 --replica_config_num_pipeline_stages 1 --request_generator_config_type synthetic --length_generator_config_type trace --interval_generator_config_type static --trace_request_length_generator_config_max_tokens 4096 --trace_request_length_generator_config_trace_file ./data/processed_traces/arxiv_summarization_stats_llama2_tokenizer_filtered_v2.csv --synthetic_request_generator_config_num_requests 128 --replica_scheduler_config_type vllm --vllm_scheduler_config_batch_size_cap 256 --vllm_scheduler_config_max_tokens_in_batch 4096 --metrics_config_wandb_project idps-wandb INFO 09-25 12:47:13 trace_request_length_generator.py:78] Loaded request length trace file ./data/processed_traces/arxiv_summarization_stats_llama2_tokenizer_filtered_v2.csv with 28257 requests INFO 09-25 12:47:15 simulator.py:60] Starting simulation with cluster: Cluster({'id': 0, 'num_replicas': 1}) and 128 requests INFO 09-25 12:47:15 simulator.py:80] Simulation ended at: 92.29293720617318s INFO 09-25 12:47:15 simulator.py:83] Writing output Error importing optional module IPython.core.display Traceback (most recent call last): File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/_plotly_utils/optional_imports.py", line 28, in get_module return import_module(name) File "/home/idps/vidur/vidur-venv/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 992, in _find_and_load_unlocked File "", line 241, in _call_with_frames_removed File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 992, in _find_and_load_unlocked File "", line 241, in _call_with_frames_removed File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/IPython/init.py", line 55, in from .terminal.embed import embed File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/IPython/terminal/embed.py", line 16, in from IPython.terminal.interactiveshell import TerminalInteractiveShell File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/IPython/terminal/interactiveshell.py", line 48, in from .debugger import TerminalPdb, Pdb File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/IPython/terminal/debugger.py", line 18, in from concurrent.futures import ThreadPoolExecutor File "", line 1075, in _handle_fromlist File "/home/idps/vidur/vidur-venv/lib/python3.10/concurrent/futures/init.py", line 49, in getattr from .thread import ThreadPoolExecutor as te File "/home/idps/vidur/vidur-venv/lib/python3.10/concurrent/futures/thread.py", line 37, in threading._register_atexit(_python_exit) File "/home/idps/vidur/vidur-venv/lib/python3.10/threading.py", line 1504, in _register_atexit raise RuntimeError("can't register atexit after shutdown") RuntimeError: can't register atexit after shutdown INFO 09-25 12:47:18 simulator.py:86] Metrics written INFO 09-25 12:47:18 simulator.py:95] Chrome event trace written

ozcanmiraay commented 1 day ago

I run into the exact same issue!