meta-llama / llama-stack

Composable building blocks to build Llama Apps
MIT License
7.48k stars 929 forks source link

Configuration API #993

Open cdoern opened 1 month ago

cdoern commented 1 month ago

🚀 Describe the new functionality needed

Configuration API

adding providers outside of the current scope will likely necessitate the following:

  1. bespoke configuration based on hardware (GPU, CPU, etc) that should apply to multiple providers in order for them to work properly.
  2. hyperparameters for specific providers that should be both auto-detected and able to be selected using a CLI.
  3. a way to check available configurations, currently assigned configurations, etc.

I imagine this functionality working similarly to Models or Inspect where these are a high level API. Additionally these objects should be applicable for other providers to "register" one of them. Configurations similarly to models, should operate as an "overarching" API that one can register, list, get, and unregister a configuration.

usage pattern:

llama stack build && llama stack run (administrator starts stack)

a user could run:

llama-stack-client configurations inspect

 providers:
  agents:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  datasetio: []
  eval: []
  inference:
  - config:
      url: http://localhost:12345
    provider_id: ollama
    provider_type: remote::ollama
  safety: []
  scoring:
  - config: {}
    provider_id: braintrust
    provider_type: inline::braintrust
  telemetry:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  tool_runtime:
  - config: {}
    provider_id: brave-search
    provider_type: remote::brave-search
  - config: {}
    provider_id: tavily-search
    provider_type: remote::tavily-search
  vector_io:
  - config: {}
    provider_id: faiss
    provider_type: inline::faiss
  - config: {}
    provider_id: sqlite_vec
    provider_type: inline::sqlite_vec

llama-stack-client configurations register --config <file_path>

or using the SDK:

current_config = client.configurations.inspect()
print(current_config)
config = { "inference": [{'provider_id': 'ollama', 'provider_type': 'remote::ollama', 'config': {'url': 'http://localhost:12345'}}]}
config = json.dumps(config)
config = client.configurations.register(config=config)
print(config)

the configuration API would look something like:

@json_schema_type
class Configuration(BaseModel):
    type: Literal[ResourceType.configuration.value] = ResourceType.configuration.value
    config: StackRunConfig

class ConfigListResponse(BaseModel):
    data: List[dict[str, Any]]

@runtime_checkable
@trace_protocol
class Configurations(Protocol):
    """Llama Stack Configuration API for storing and applying hyperparameters for given tasks.

    """

    @webmethod(route="/configurations/register", method="POST")
    async def register_config(
        self,
        config,
    ) -> dict[str, Any]: ...

With the inspect API expanded to have a /configurations endpoint:

@runtime_checkable
class Inspect(Protocol):

    @webmethod(route="/inspect/configurations", method="GET")
    async def inspect_config(
        self,
    ) -> InspectConfigResponse: ...

UserConfig vs StackRunConfig

A key part of this API are the fields exposed in both the inspection and registration. A Configuration object contains a StackRunConfig within it. However, the data within this config is a UserConfig. A UserConfig is a StackRunConfig but only with specific fields displayed to the user. Since each provider has its own config class that feeds into the StackRunConfig the following can be used to label certain fields as "User Configurable":

url: str = Field(DEFAULT_OLLAMA_URL, json_schema_extra={"user_field": True})

the pydantic json_schema_extra field can then be used when creating a Configuration object to create an intermediary UserConfig. The User Config will only have fields labeled as user_field meaning that if a user tries to register a configuration with non-user fields specified, they will be dropped, and an inspected configuration will only contain user fields for viewing as well. In the above example the url is the only field given the user_field schema which is why it is one of the few things showing up.

Server Side Device Discovery for Initial Configuration

Before a user can inspect or register a config of their own, it would make sense to allow providers to utilize a centralized hardware discovery service built into llama-stack. Providers could then act on this information inside of their configuration initialization methods to apply certain defaults depending on the hardware discovered as opposed to a blanket set of defaults.

💡 Why is this needed? What if we don't build it?

Without a system like the above, it will be difficult to orchestrate a sequence of providers intended to "work together" or even a single complex provider to be easily accessible to users. Additionally, the more complex APIs and providers that are introduced, the greater odds runtime manipulation of key configuration fields will be necessary.

Say someone provides a data generation, training, and evaluation methodology as separate providers, and each of these depends on specific hardware requirements, hyper parameters, etc to interact with one another and these parameters change per hardware (H100 vs A100 vs L40).

Exposing the current provider configuration to a user will help them understand what they will be running for various providers as functionality gets more complex (SDG, Evals, Training, etc). Additionally, allowing a user to apply parts of a config on top of a running stack as opposed to taking the stack down and having the admin apply a full run config again seems like a more sustainable workflow.

Other thoughts

I would like to work on this in collaboration with anyone if possible!

cdoern commented 1 month ago

As noted in the issue description, I would like to work on some version of this and have some code locally already I might make into a draft PR. If possible, please add me as the assignee!

dmartinol commented 1 month ago

In addition to the get, list, register and unregister services, how about adding a (device) discovery service to guess the best configuration?