vercel / ai

Build AI-powered applications with React, Svelte, Vue, and Solid
https://sdk.vercel.ai/docs
Other
9.45k stars 1.39k forks source link

image generation #2524

Open lgrammel opened 1 month ago

lgrammel commented 1 month ago

Discussed in https://github.com/vercel/ai/discussions/2520

Originally posted by **Und3rf10w** August 1, 2024 What are the thoughts on adding a `imageGenerationModel` as a property to a `Provider`? I came up with these types as an example, but this is tied directly to Dalle. ```typescript type ImageGenerationModelV1CallSettings = { size: string n: number quality?: 'natural' | 'vivid' /** Response format. The output can either be url or b64json. Default is url. */ responseFormat?: { type: 'url' } | { type: 'b64json' } /** Abort signal for cancelling the operation. */ abortSignal?: AbortSignal /** Additional HTTP headers to be sent with the request. Only applicable for HTTP-based providers. */ headers?: Record } type ImageModelV1CallOptions = ImageGenerationModelV1CallSettings & { inputFormat: 'prompt' prompt: LanguageModelV1Prompt } type ImageGenerationModelV1FinishReason = 'completed' | 'content-filter' | 'error' | 'other' | 'unknown' type DallePromiseLike = PromiseLike<{ created: Date expires?: Date id?: string result?: { data: Array<{ url: string }> } status?: string data?: Array<{ url: string revised_prompt: string }> error?: { code: string message: string } lastActionDateTime: Date }> export interface ImageGenerationModelV1 { /** The language model must specify which language model interface version it implements. This will allow us to evolve the language model interface and retain backwards compatibility. The different implementation versions can be handled as a discriminated union on our side. */ readonly specificationVersion: 'v1'; /** Name of the provider for logging purposes. */ readonly provider: string; /** Provider-specific model ID for logging purposes. */ readonly modelId: string; /** Default object generation mode that should be used with this model when no mode is specified. Should be the mode with the best results for this model. `undefined` can be returned if object generation is not supported. This is needed to generate the best objects possible w/o requiring the user to explicitly specify the object generation mode. */ readonly defaultObjectGenerationMode: 'json' | 'tool' | undefined doGenerate: (options: ImageModelV1CallOptions) => DallePromiseLike & { /** * Finish reason. */ finishReason: ImageGenerationModelV1FinishReason /** Raw prompt and setting information for observability provider integration. */ rawCall: { /** Raw prompt after expansion and conversion to the format that the provider uses to send the information to their API. */ rawPrompt: unknown /** Raw settings that are used for the API call. Includes provider-specific settings. */ rawSettings: Record } /** Optional raw response information for debugging purposes. */ rawResponse?: { /** Response headers. */ headers?: Record } } } ``` Maybe the call settings and `DallePromiseLike` would be tied to like an `OpenAIImageGenerationModel` instead that implements from `ImageModelV1`? I don't actually have any idea on how to properly handle `doStream` cases though. Ideally, since most of the providers taht are available in the `ProviderRegistry` also expose image generation models (e.g. `Dalle` for OpenAI, `Stable diffusion` for vertex/bedrock), we would also be able to add those and manage those using the registry so we only have to track models in one place and they can all be self contained in the registry.
lgrammel commented 1 month ago

We plan to implement it, but it's lower priority. The implementation will most likely be a mix of what we do for llms and what i implemented in modelfusion for images.

Please upvote with šŸ‘šŸ» if this feature is important to you