meta-llama / llama-stack

Composable building blocks to build Llama Apps
https://llama-stack.readthedocs.io
MIT License
7.91k stars 1.1k forks source link

[nvidia] NVIDIA_API_KEY optional for non-catalogue model deployments #1956

Closed raspawar closed 1 day ago

raspawar commented 3 months ago

System Info

llama-stack NVIDIA distribution

🐛 Describe the bug

The NVIDIA llm connector expects NVIDIA_API_KEY, which is not required for the other nvidia adapter and can be ignored. Error is generated from: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/nvidia/nvidia.py#L78

NVIDIA LLM connector requirements vary by use case:

Expected behavior

  1. Skip API Key Check for Custom Models: If NVIDIA_BASE_URL exists without NVIDIA_API_KEY, assume non-catalogue model and proceed
  2. Warn and Continue: If NVIDIA_API_KEY missing, warn user and let API handle auth errors Which approach would you prefer to implement?
raspawar commented 3 months ago

@mattf @dglogo @JashG

mattf commented 3 months ago

Image Image

it should only error if the base url is integrate.api.nvidia.com and the api key is missing.

raspawar commented 3 months ago

Image Image

it should only error if the base url is integrate.api.nvidia.com and the api key is missing.

By default base_url gets set to integrate.api.nvidia.com and when the nvidia distribution is up it expects API Key. For using NeMo microservices the API key is not necessary and base_url is updated only after the new post-training model is ready for inference(in that case base_url is set to something like: http://nim.test(expected http://nim.test/v1/completions)

mattf commented 3 months ago

By default base_url gets set to integrate.api.nvidia.com and when the nvidia distribution is up it expects API Key. For using NeMo microservices the API key is not necessary and base_url is updated only after the new post-training model is ready for inference(in that case base_url is set to something like: http://nim.test(expected http://nim.test/v1/completions)

is the issue that a single inference provider is being used for both (a) hosted inference w/ integrate.api.nvidia.com as well as (b) local inference w/ the a fine-tuned nvidia nim?

if that's the case, what about setting up multiple inference providers?

if that's not the case, will you provide an example distro and use case?

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.

github-actions[bot] commented 1 day ago

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant!