samchon / typia

Super-fast/easy runtime validators and serializers via transformation
https://typia.io/
MIT License
4.63k stars 159 forks source link

OpenAI structured outputs support #1307

Open antoniomdk opened 1 month ago

antoniomdk commented 1 month ago

Feature Request

I've been working with typia.llm.schema for a while and it has been extremely helpful in generating JSON schemas to call LLMs from TS types. However, the new structured outputs API of OpenAI has some limitations in the type of schemas it can take.

In particular nullable is not been taken into account. So it'd be great if we could map types X | null to anyOf. Maybe introducing a new flag to the typia.llm.schema function.

Also, for types that don't extend from Record, we should mark [additionalProperties to false](https://platform.openai.com/docs/guides/structured-outputs/additionalproperties-false-must-always-be-set-in-objects).

I can contribute to this feature, but I may need some pointers for code references to start.

samchon commented 1 month ago

T | null type cannot be oneOf type, because it is the specification of JSON schema (of OpenAPI v3.0) that OpenAI has adopted. Writing T | null type as oneOf type, it is allowed since JSON schema 2020-12 draft version (of OpenAPI v3.1).

By the way, OpenAI understands only understands the anyOf type? Currently, @samchon/openapi and typia are utilizing oneOf type for the TypeScript union type case, because oneOf type has clear meaning than anyOf type.

samchon commented 1 month ago

Also, about the additionalProperties to be false, it should be a little bit careful.

The additionalProperties := false means that it does not allow any type of superfluous properties. In the validation rule, if there're any extra property that is not defined in the properties, it must be considered as invalid. It is the additionalProperties := false meaning.

Therefore, if you want to contribute to typia.llm.application<App>() and typia.llm.schema<T>() functions, you have to be careful about the rule.

Here is the code filling the ILlmSchema.IObject.additionalProperties property, and you can accomplish what you want just by changing the return type of the join() function from ILlmSchema | undefined to ILlmSchema | false.

https://github.com/samchon/typia/blob/8edeef57da6a121e52412da556074335dd93ef3a/src/programmers/internal/llm_schema_object.ts#L98-L121

antoniomdk commented 1 month ago

I haven't found any info about if OpenAI supporting oneOf, they do mention they support anyOf, but I agree that oneOf should be right type, (doesn't make any sense for a type to be null and not null at the same time). That's why I was suggesting putting this behavior changes under a flag or making the user explicitly ask for that, because it deviates from OpenAPI & JSON schema standards.

antoniomdk commented 1 month ago

For example, the OpenAPI SDK has the following unit tests: https://github.com/openai/openai-python/blob/aeaed488352274a9ca86c834eeb618d732989518/tests/lib/test_pydantic.py#L65

no references to oneOf unfortunately.

More info: https://community.openai.com/t/official-documentation-for-supported-schemas-for-response-format-parameter-in-calls-to-client-beta-chats-completions-parse/932422/4

samchon commented 1 month ago

How about the other models?

In the Google Gemini case, it is using the OpenAPI v3.0.3 specified JSON schema, but not supporting oneOf.

OpenAI, it sometimes looks like using OpenAPI v3.1, and sometimes v3.0. It supports mixed-in types embodied by type: ["string", "null"], but not supporting tuple type embodied by { type: "array", prefixItems: [A, B, C] }. I need to study and test OpenAI deeply at next weekend.

samchon commented 1 month ago

To support LLM function calling feature exactly, I should separate the providers like below.

samchon commented 1 month ago

@antoniomdk If you send an PR about additionalProperties, I'll accept it.

Also, about the manipulating specific LLM provider's schema, I'll prepare the major update.

It would be @samchon/openapi@2.0.0 and typia@7.0.0.

antoniomdk commented 1 month ago

@samchon That sounds great! I think the LLM-specific separation makes a lot of sense. I'll send a PR for additionalProperties by EOW (probably during the weekend).

bradleat commented 3 weeks ago

Related to LLM structured outputs, I find that when prompting I often want to use the jsdoc comment for a type in the prompt. Can typia add a misc method for returning the jsdoc string of a particular type.

Using typia.reflect.metadata can get you this information, but it'd be nice to just get the jsdoc comment.

samchon commented 1 week ago

@antoniomdk, @bradleat https://github.com/samchon/openapi/blob/v2.0/src/structures/IChatGptSchema.ts

I'm preparing the OpenAI dedicated schema type as IChatGptSchema in the next version of @samchon/openapi and typia.

Here is the type, and I'll test it by using the ChatGPT API, and considering below things.

If you want to experience it earlier, install typia@next version, and call the typia.llm.application<App, "chatgpt">().

npm install typia@next
samchon commented 1 week ago

Here is an example of the currently considering IChatSchema's use case.

Source Code

import {
  ChatGptTypeChecker,
  IChatGptSchema,
  ILlmApplication,
} from "@samchon/openapi";
import typia, { tags } from "typia";

const app: ILlmApplication<"chatgpt"> = typia.llm.application<
  BbsArticleController,
  "chatgpt"
>({
  separate: (schema: IChatGptSchema) =>
    ChatGptTypeChecker.isString(schema) &&
    schema.contentMediaType !== undefined,
});
console.log(app);

interface BbsArticleController {
  /**
   * Create a new article.
   *
   * Writes a new article and archives it into the DB.
   *
   * @param input Information of the article to create
   * @returns Newly created article
   */
  create(input: IBbsArticle.ICreate): Promise<IBbsArticle>;

  /**
   * Update an article.
   *
   * Updates an article with new content.
   *
   * @param id Target article's {@link IBbsArticle.id}
   * @param input New content to update
   */
  update(
    id: string & tags.Format<"uuid">,
    input: IBbsArticle.IUpdate,
  ): Promise<void>;

  /**
   * Erase an article.
   *
   * Erases an article from the DB.
   *
   * @param id Target article's {@link IBbsArticle.id}
   */
  erase(id: string & tags.Format<"uuid">): Promise<void>;
}

/**
 * Article entity.
 *
 * `IBbsArticle` is an entity representing an article in the BBS (Bulletin Board System).
 */
interface IBbsArticle extends IBbsArticle.ICreate {
  /**
   * Primary Key.
   */
  id: string & tags.Format<"uuid">;

  /**
   * Creation time of the article.
   */
  created_at: string & tags.Format<"date-time">;

  /**
   * Last updated time of the article.
   */
  updated_at: string & tags.Format<"date-time">;
}
namespace IBbsArticle {
  /**
   * Information of the article to create.
   */
  export interface ICreate {
    /**
     * Title of the article.
     *
     * Representative title of the article.
     */
    title: string;

    /**
     * Content body.
     *
     * Content body of the article writtn in the markdown format.
     */
    body: string;

    /**
     * Thumbnail image URI.
     *
     * Thumbnail image URI which can represent the article.
     *
     * If configured as `null`, it means that no thumbnail image in the article.
     */
    thumbnail:
      | null
      | (string & tags.Format<"uri"> & tags.ContentMediaType<"image/*">);
  }

  /**
   * Information of the article to update.
   *
   * Only the filled properties will be updated.
   */
  export type IUpdate = Partial<ICreate>;
}

Compiled Code

import * as __typia_transform__llmApplicationFinalize from "typia/lib/internal/_llmApplicationFinalize.js";
import { ChatGptTypeChecker } from "@samchon/openapi";
import typia from "typia";
const app = (() => {
  const app = {
    model: "chatgpt",
    functions: [
      {
        name: "create",
        parameters: [
          {
            $ref: "#/$defs/IBbsArticle.ICreate",
            description: "Information of the article to create",
            $defs: {
              "IBbsArticle.ICreate": {
                type: "object",
                properties: {
                  title: {
                    type: "string",
                    title: "Title of the article",
                    description:
                      "Title of the article.\n\nRepresentative title of the article.",
                  },
                  body: {
                    type: "string",
                    title: "Content body",
                    description:
                      "Content body.\n\nContent body of the article writtn in the markdown format.",
                  },
                  thumbnail: {
                    oneOf: [
                      {
                        type: "null",
                      },
                      {
                        type: "string",
                        format: "uri",
                        contentMediaType: "image/*",
                      },
                    ],
                    title: "Thumbnail image URI",
                    description:
                      "Thumbnail image URI.\n\nThumbnail image URI which can represent the article.\n\nIf configured as `null`, it means that no thumbnail image in the article.",
                  },
                },
                required: ["title", "body", "thumbnail"],
                description: "Information of the article to create.",
                additionalProperties: false,
              },
            },
          },
        ],
        output: {
          $ref: "#/$defs/IBbsArticle",
          description: "Newly created article",
          $defs: {
            IBbsArticle: {
              type: "object",
              properties: {
                id: {
                  type: "string",
                  format: "uuid",
                  title: "Primary Key",
                  description: "Primary Key.",
                },
                created_at: {
                  type: "string",
                  format: "date-time",
                  title: "Creation time of the article",
                  description: "Creation time of the article.",
                },
                updated_at: {
                  type: "string",
                  format: "date-time",
                  title: "Last updated time of the article",
                  description: "Last updated time of the article.",
                },
                title: {
                  type: "string",
                  title: "Title of the article",
                  description:
                    "Title of the article.\n\nRepresentative title of the article.",
                },
                body: {
                  type: "string",
                  title: "Content body",
                  description:
                    "Content body.\n\nContent body of the article writtn in the markdown format.",
                },
                thumbnail: {
                  oneOf: [
                    {
                      type: "null",
                    },
                    {
                      type: "string",
                      format: "uri",
                      contentMediaType: "image/*",
                    },
                  ],
                  title: "Thumbnail image URI",
                  description:
                    "Thumbnail image URI.\n\nThumbnail image URI which can represent the article.\n\nIf configured as `null`, it means that no thumbnail image in the article.",
                },
              },
              required: [
                "id",
                "created_at",
                "updated_at",
                "title",
                "body",
                "thumbnail",
              ],
              description:
                "Article entity.\n\n`IBbsArticle` is an entity representing an article in the BBS (Bulletin Board System).",
              additionalProperties: false,
            },
          },
        },
        description:
          "Create a new article.\n\nWrites a new article and archives it into the DB.",
      },
      {
        name: "update",
        parameters: [
          {
            type: "string",
            format: "uuid",
            description: "Target article's ",
          },
          {
            $ref: "#/$defs/PartialIBbsArticle.ICreate",
            description: "New content to update",
            $defs: {
              "PartialIBbsArticle.ICreate": {
                type: "object",
                properties: {
                  title: {
                    type: "string",
                    title: "Title of the article",
                    description:
                      "Title of the article.\n\nRepresentative title of the article.",
                  },
                  body: {
                    type: "string",
                    title: "Content body",
                    description:
                      "Content body.\n\nContent body of the article writtn in the markdown format.",
                  },
                  thumbnail: {
                    oneOf: [
                      {
                        type: "null",
                      },
                      {
                        type: "string",
                        format: "uri",
                        contentMediaType: "image/*",
                      },
                    ],
                    title: "Thumbnail image URI",
                    description:
                      "Thumbnail image URI.\n\nThumbnail image URI which can represent the article.\n\nIf configured as `null`, it means that no thumbnail image in the article.",
                  },
                },
                description: "Make all properties in T optional",
                additionalProperties: false,
              },
            },
          },
        ],
        description:
          "Update an article.\n\nUpdates an article with new content.",
      },
      {
        name: "erase",
        parameters: [
          {
            type: "string",
            format: "uuid",
            description: "Target article's ",
          },
        ],
        description: "Erase an article.\n\nErases an article from the DB.",
      },
    ],
    options: {
      separate: null,
    },
  };
  __typia_transform__llmApplicationFinalize._llmApplicationFinalize(app, {
    separate: (schema) =>
      ChatGptTypeChecker.isString(schema) &&
      schema.contentMediaType !== undefined,
  });
  return app;
})();
console.log(app);