mlabonne / llm-autoeval

Automatically evaluate your LLMs in Google Colab
MIT License
460 stars 77 forks source link

Runpod says 'no longer any instance available' #3

Closed ncs-gobubble closed 5 months ago

ncs-gobubble commented 5 months ago

I was trying to use the notebook and run eval on one Nvidia A40 GPU 100 GB disk size and get the following error

---------------------------------------------------------------------------
QueryError                                Traceback (most recent call last)
[<ipython-input-5-05063fbf8ce4>](https://localhost:8080/#) in <cell line: 41>()
     39 
     40 # Create a pod
---> 41 pod = runpod.create_pod(
     42     name=f"Eval {MODEL.split('/')[-1]} on {BENCHMARK.capitalize()}",
     43     image_name="runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04",

1 frames
[/usr/local/lib/python3.10/dist-packages/runpod/api/graphql.py](https://localhost:8080/#) in run_graphql_query(query)
     28 
     29     if "errors" in response.json():
---> 30         raise error.QueryError(
     31             response.json()["errors"][0]["message"],
     32             query

QueryError: There are no longer any instances available with the requested specifications. Please refresh and try again.

I have funds in my runpod account and a pod with A40 can be created on runpod.io . Any idea how to go about debugging this?

ncs-gobubble commented 5 months ago

What could be further useful is if I can pass an existing pod. I am able to create a pod with the required specifications directly on runpod

ncs-gobubble commented 5 months ago

Ahh just found out it was because of the cloud_type argument. Suggesting to update it in colab notebook itself because selecting the type of runpod is an important step for privacy concerns.

CLOUD_TYPE = 'COMMUNITY' # @param ["COMMUNITY", "SECURE"]
mlabonne commented 5 months ago

Hello I'll add your problem to the README.md but I try to minimize the number of inputs so it doesn't become too complex to use. Thanks!

mlabonne commented 5 months ago

... Nevermind I changed it, you're right this is important because it's linked to RunPod and not the model's parameters.