RFC: Gen AI Framework Gem (First version)

Summary:

With the recent raise of generative AI models such as GPT 4, Claude 3 or Mistral, tools are emerging to plug them into existing workflows and to create new workflows that they enable. This proposal brings forth the new Gen AI Framework Gem, which is meant to help O3DE developers to utilize modern AI in games and simulations.

This RFC has been developed as a follow up to https://github.com/o3de/sig-simulation/issues/86, including gathered feedback. It also dives deeper into technical solutions, focuses on immediate first steps (except for the clearly separated vision chapter)

The Gen AI Framework Gem is quite different from Machine Learning Gem in that it focuses on Generative AI instead on multi-layer perceptrons. Considering dynamic nomenclature in the space and versatility of the Gem though, there are considerations against including "Generative" in the name.

Vision

Before diving into specifics of first step, it is useful to underline the role and scope of Gen AI Framework Gem.

It is a framework Gem, intended for developers to build on. As such, it does not, by itself, do anything useful that is user-facing. Other Gems will use it to build features powered by Generative AI.
There is nothing hosting or vendor-specific in this Gem. Separate Gems are to use this Gem to connect to specific AI services or models.
The scope of the Gen AI Framework Gem can be summarized as anything common to Gen AI use cases and vendor-specific AI service / model APIs, details being outlined in this RFC.

GenAIFramework

The Gen AI Framework Gem is meant for O3DE Gems developers, and it is expected to be a dependency of future gems such as AI characters, assistants, and scene generators. Unlike these future gems, its value does not strictly depend on current capabilities of AI models, as it is meant as a tool to explore their limits and is meant to be built in a flexible way, benefiting from improvements in these capabilities over years.

Generative AIs can be used through 3rd-party hosted services, such as Ollama, Amazon Bedrock or Open AI's GPT platform. These typically involve a pricing model per token, depending on model type and modality. There are proprietary models and open-source models. It is also possible to host models locally, including through tools such as Ollama or vLLM. AI services increasingly offer additional modalities (such as image prompting) as well as complex services, such as Assistants. The Gen AI Framework Gem is meant to be agnostic and supportive of all these common options.

Framework AI Gem is vendor-agnostic but includes tests and empowers testing of future gems that will utilize it. A new feature gem can also be delivered with a mocked AI vendor gem. This mocked vendor gem can produce predictable responses, that will give stable, cost effective and fast tests.

In the long run, from the perspective of game development, this Gem is a step to build smart characters that interact uniquely with the player, and assist in writing dialogue as well as creating 3D world.

From the perspective of simulation, it is a first step towards creating simulated robots, humans (in roles such as pedestrians, warehouse workers) and animals that behave in certain ways with less scripting, build smartly randomized simulation scenes and assist users in running validation scenarios as well as summarizing their results.

Long-term scope of the Gen AI Framework Gem

Given the new possibilities coming from recent advances in AI, game and simulation developers are looking to apply these new capabilities in their creations. Many steps that these developers need to take are common regardless of type of application, for example:

Connecting with AI services that are remote or local, 3rd-party or self hosted.
Communicating with multi-model, multi-modal (text, image, video, audio, ..) AI services (to be implemented in vendor Gems).
User experience and interface for prompting, handling errors, iterating on tasks, and basic human-AI collaboration workflow.
Global configuration for common parameters such as URI, usage limits, default models, etc. This configuration will be affected and populated with options by available vendor Gen AI gems.
Simple visualization of AI outputs and logs.
An extendable Remote Procedure Call (RPC)-like API for AI to call, enabling whitelist approach to AI command of O3DE.
Default support for a selected 1-2 best (by some metric) open source models.
Modular design with good developer interfaces, allowing to easily add new modalities and features.
Support/abstraction for awareness of used tokens, pricing and quota (to be implemented in vendor Gems).

The Gen AI Framework Gem is intended to supply useful abstractions, interfaces and tooling for these steps.

What is the scope and relevance of this feature?

This feature is meant to capture the first version of Gen AI Framework Gem as a step towards features described in the Vision. The first version, this RFC, focuses on the following features:

Implemented support for Editor use only, but designed towards both Editor and Launcher.
Simple set of global settings, some of which are to be populated with choices depending on available vendor Gems.
Simple text-only prompting interface for users. Note that prompts won't go anywhere if there is no vendor Gem in the project. In such case, user will be warned that there is no AI service available to handle the prompt.
An output window to show the model response to the prompt. The code part of this response is also executed through reflected python API.
Generation and communication of definitions of all AI-exposed interfaces in Python, much like python symbols that are currently exported, but including additional documentation.
Utility functions to query about available engine interfaces.
Passing execution results including errors back to AI service.
Abstraction of AI context, which is meant to capture an instance of AI model, its "conversation" context, scope (Editor, launcher, both) and behavior context of available interfaces.
Connection of AI code response to specific executor, which depends on the AI context. For example, will be likely different for the launcher, or can be different if another scripting language is exposed. These other options will not be implemented, but abstractions will be put in place to allow extensions. Note: for now, we focus only on executing generated code directly (doing work in O3DE), as opposed to generating code for the human user to run, which is also a valid case for Generative AI.

With this first version complete, development of feature and vendor-specific Gems can be unlocked already.

Feature design description:

AI Context

AI Context is a new structure capturing identifying and necessary information about a specific instance of AI service. This includes an identifying key (which can be invalid, meaning the context is not valid), context entity id, and operational scope. AI Context is supplied when making API calls. Examples of intended future use:

For scene generation, a global context with invalid entity id and operational scope of Editor will be used.
For AI-driven characters, a local contexts with relevant entity ids and operational scope of Launcher will be used. AI Context might be serialized in the future, so that it is easy to keep track of used AI service context and reconnect to it, which might be useful to keep the knowledge of previous sessions. The first version does not include serialization of AI Contexts.

AI calling O3DE interfaces

O3DE has Behavior Context to reflect classes, buses, methods, and properties. It captures essential information required to call these behaviors. Behavior Context is used by Python and Lua scripting code to expose and call O3DE interfaces. Since this is a well-traveled path for script execution, the Gen AI Framework gem will fully utilize Behavior Context in its implementation.

For the first version of the gem, the focus is on Editor functionality, and the use of existing Python-exposed Automation Scope API. In the future, we might want to limit parts of API available to AI service further than through the Scope flag. This can be implemented by creating a dedicated Behavior Context based on AI Context and some configurations.

This also means that the first version of Gen AI Framework Gem will depend on Editor Python Bindings gem. It is likely that this dependency will be removed in the future, once runtime use-case for the gem is implemented.

O3DE API available for calling needs to be communicated to the AI service, which necessitates producing text of methods and buses definitions. Code that does just that is already there in the Editor Python Bindings gem, in PythonLogSymbolsComponent.cpp file. Since there was previously no use for that outside of the bindings gem, the code that creates methods and ebuses definitions strings is right now lumped together with writing these strings to files. Three changes in Editor Python Bindings are proposed within this RFC:

Separate out the API-to-text functionality from text-to-file.
Expose API-to-text through a set of separate interface, allowing to query either by method or ebus name, or query for all exposed interfaces.
Make use of additional documentation when generating the definitions, namely the ToolTip attributes for methods, ebuses, and parameters.

The last change is needed because AI services need to have information about the purpose and expected outcome of API calls. Much of Automation scope API does is not reflected with documentation at the moment. A separate change PR will be introduced to add such documentation to BehaviorContext reflection for some APIs.

To execute code generated by AI automatically, code block will be extracted from the response and passed to the ScriptCall interface:

virtual bool ScriptCall(const AZStd::string& script, AZStd::string& response, const AIContext& aiContext) = 0;

Depending on the AI Context, a fitting Script Executor will be used. These executors handle script code execution as well as extracting and communicating back results, errors and logs. For the first version of the gem, an executor fitting for Editor and Python will be implemented, and it will call the Python Runner bus, i.e.:

AzToolsFramework::EditorPythonRunnerRequestBus::Broadcast(
            &AzToolsFramework::EditorPythonRunnerRequestBus::Events::ExecuteByString, script, true);

Features built on top of this gem will most likely define and implement new APIs for AI service to call. Note that adding a new API to O3DE requires nothing more than defining Behavior Context for it in the Reflect function, setting the scope flag to Automation, and making sure ToolTip attributes are set, for example:

    void CustomEditorTest::Reflect(AZ::ReflectContext* context)
    {
        if (auto behaviorContext = azrtti_cast<AZ::BehaviorContext*>(context))
        {
            behaviorContext->EBus<CustomEditorRequestBus>("CustomEditorRequestBus")
                ->Attribute(AZ::Script::Attributes::ToolTip, "Custom request bus documentation")
                ->Attribute(AZ::Script::Attributes::Scope, AZ::Script::Attributes::ScopeFlags::Automation)
                ->Attribute(AZ::Script::Attributes::Category, "AI")
                ->Attribute(AZ::Script::Attributes::Module, "test")
                ->Event(
                    "DoTheTestThing",
                    &CustomEditorRequests::DoTheTestThing,
                    { AZ::BehaviorParameterOverrides("testParameter", "documentation for test parameter") })
                ->Attribute(AZ::Script::Attributes::ToolTip, "Custom event documentation. Logs and returns its own parameter");
        }
  }

O3DE calling AI service

In this first version, the simplest interface will be available through a global bus:

void FetchGenAIGlobalEditorContext(AIContext& context);
AZStd::string GenAITextQuery(const AIContext& context, const AZStd::string& query);

Outside of tests, this will not be yet in use for the first version.

O3DE sharing data with AI

Initial prompt serves to communicate definitions and documentation necessary for the AI service to perform O3DE work. This includes:

Entire Automation scope behavior context. Note that this might be both narrowed down (to strip down token cost, noise, and limit error exposure) and expanded with new interfaces added by feature gems.
The role and the work expected from the service: this will be added by feature gems.

AI service can further learn about the project with behavior buses and methods returning information such as about the scene, character pose and view port.

Communication and configuration of Generative AI services

The Gen AI Framework Gem supports developers by providing abstractions to implement vendor-specific components for communication. These components are divided into two categories:

Communication / Requests: AI services have different APIs and hosting options, these components will handle that.
Configuring: Specific configurations for vendor services and models, including system messages, temperature, token limits, URIs, and models.

Overview diagram

WIP_diagram

User Experience and Interaction

The first version does not focus on common UX, providing simple prompting input widget and another one for output, to monitor the response and errors.

Another UX element will be the possibility to rollback if AI executed code results are invalid (Undo batch).

Scripting language considerations

Python and Lua scripting is already supported in O3DE.

Python has the following advantages:

AI models are likely better trained for generating Python.
Better known by the target user group.
More powerful.

Lua has the following advantages:

Simple, clean syntax and easy to learn.
Lightweight Scripting Language.
Easy to embed directly into a C++ Application (like O3DE).
O3DE has an integrated Lua IDE that supports debugging.

It would be best to unify scripting interface for AI between Editor and Launcher, as it would simplify the design as well as better support Gen AI models working in both scopes, for example when developing new scenarios, changing scenes and testing them by running simulations). There are challenges in doing so.

Editor support for python is possible now because as a whole, python is included with the entire engine and editor environment. For anything that is needed for the game launcher, it needs to be able to export externally and work outside of the editor environment. Since Lua is part of the core libraries, it is available to any game launcher. In order to make python available to game launchers, a strategy needs to be designed to provide the python runtime to the launcher (most likely as some form of packaged python virtual environment).

In case of work to enable python in runtime (as an option at least), the EditorPythonBindings Gem would be renamed to just PythonBindings.

Security

We would like to limit the possibility of erroneous code affecting users negatively. We don't want to execute scripts without any limitations, for example:

import shutil
shutil.rmtree("~")

could remove the user directory.

In the first version, a simple proposed approach would be whitelisting all modules (imports) and stripping the generated code. Another approach to try is restricted python: https://restrictedpython.readthedocs.io/en/latest/#. There might be limitations on which platforms can support this.

Comments are welcome on both the first version solution and how to best solve the issue in the long run.

What are the advantages of the feature?

It introduces modern AI to O3DE Gem, making it attractive for development of games and simulations with generative AI.

What are the disadvantages of the feature?

In the future, it requires some changes to existing behavior context reflection, such as adding documentation, and further reaching changes to code for handling scripting, which might be overall impactful.

How will users learn this feature?

The Gem will be a part of canonical set, documented and cross-referenced in O3DE documentation. Publicity for the Gem is also planned, and showcase demo will be released in 2024. The gem will likely be presented alongside other AI gem(s), as it focused on core functionalities rather than user-facing features.

Are there any open questions?

What is your opinion on unification of edit-time and runtime scripting?
How to best handle security?
How to best handle long wait times for some of AI work / response?

Co-authors

@spham-amzn, @arturkamieniecki

o3de / sig-simulation