heurist-network / miner-release

Stable Diffusion and LLM miner for Heurist
Other
39 stars 34 forks source link

/metrics route can't reach #53

Closed BugaoxingXXX closed 1 month ago

BugaoxingXXX commented 1 month ago

hi, while run ther miner on 4090 with dolphin-2.9-llama3-8b model , an error accoured

Traceback (most recent call last):
  File "/root/miner-release/llm-miner-v1.1.1.py", line 182, in worker
    if get_metric_value("num_requests_running", base_config) >= base_config.concurrency_soft_limit:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>=' not supported between instances of 'NoneType' and 'int'

the 182 line in /root/miner-release/llm-miner-v1.1.1.py is :

            if get_metric_value("num_requests_running", base_config) >= base_config.concurrency_soft_limit:
                # Pass silently if too many requests are running
                # print("Too many requests running, waiting for a while")
                time.sleep(base_config.sleep_duration)
                pass

and the get_metric_value function in llm_mining_core/utils/requests_utils.py is :

def get_metric_value(metric_name, base_config):
    """
    Fetches the value of a specific metric from the llm endpoint.

    Args:
        metric_name (str): The name of the metric to fetch.
        base_config (BaseConfig): The base configuration object.

    Returns:
        float or None: The value of the metric if found, None otherwise.
    """
    try:
        url = f"{base_config.llm_url}:{base_config.port}/metrics"
        #Call the metrics endpoint to get the metric value
        response = requests.get(url)
        response_text = response.text
        lines = response_text.split('\n')
        for line in lines:
            if line.startswith(f"vllm:{metric_name}"):
                parts = line.split(' ')
                if len(parts) >= 2:
                    value = float(parts[1])
                    return value
    except Exception as e:
        # fail silently
        logging.error(f"Error occurred while finding metric value: {str(e)}")
        return None
    return None

I think the reason is the /metric route not start while model load , the start log shows these routes :


INFO 07-20 15:27:37 serving_chat.py:91] ' }}{% endif %}
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 07-20 15:27:38 serving_embedding.py:141] embedding_mode is False. Embedding API will not work.
INFO 07-20 15:27:38 api_server.py:257] Available routes are:
INFO 07-20 15:27:38 api_server.py:262] Route: /openapi.json, Methods: HEAD, GET
INFO 07-20 15:27:38 api_server.py:262] Route: /docs, Methods: HEAD, GET
INFO 07-20 15:27:38 api_server.py:262] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 07-20 15:27:38 api_server.py:262] Route: /redoc, Methods: HEAD, GET
INFO 07-20 15:27:38 api_server.py:262] Route: /health, Methods: GET
INFO 07-20 15:27:38 api_server.py:262] Route: /tokenize, Methods: POST
INFO 07-20 15:27:38 api_server.py:262] Route: /detokenize, Methods: POST
INFO 07-20 15:27:38 api_server.py:262] Route: /v1/models, Methods: GET
INFO 07-20 15:27:38 api_server.py:262] Route: /version, Methods: GET
INFO 07-20 15:27:38 api_server.py:262] Route: /v1/chat/completions, Methods: POST
INFO 07-20 15:27:38 api_server.py:262] Route: /v1/completions, Methods: POST
INFO 07-20 15:27:38 api_server.py:262] Route: /v1/embeddings, Methods: POST

my start command is ./llm-miner-starter.sh dolphin-2.9-llama3-8b --port 8000 --gpu-ids 0 after miner start , curl localhost:8000/metrics/ shows

# curl localhost:8000/metrics/
# {"detail":"Not Found"}

there is only these routes while starting , pls check wheater this is some problem white starting miner

manishbyatroy commented 1 month ago

@BugaoxingXXX please remember to chmod +x the llm-starter script. also use this : ./llm-miner-starter.sh dolphin-2.9-llama3-8b --miner-id-index 0 --port 8000 --gpu-ids 0 this starts the llm miner with model : dolphin-2.9-llama3-8b , considering eth-address-0 in .env, on port 8000, on gpu-0.

please try this out and let me know if this works!

BugaoxingXXX commented 1 month ago

image from the image , the script has excute permission

and the command ./llm-miner-starter.sh dolphin-2.9-llama3-8b --miner-id-index 0 --port 8000 --gpu-ids 0 also has same problem image

@BugaoxingXXX please remember to chmod +x the llm-starter script. also use this : ./llm-miner-starter.sh dolphin-2.9-llama3-8b --miner-id-index 0 --port 8000 --gpu-ids 0 this starts the llm miner with model : dolphin-2.9-llama3-8b , considering eth-address-0 in .env, on port 8000, on gpu-0.

please try this out and let me know if this works!

manishbyatroy commented 1 month ago

Hi @BugaoxingXXX https://github.com/heurist-network/miner-release/pull/55 fixes this issue. closing this.