o3de / sig-simulation

Special Interest Group for Simulation
Apache License 2.0
9 stars 13 forks source link

RFC: Gen AI Framework Gem (First version) #87

Closed adamdbrw closed 3 months ago

adamdbrw commented 4 months ago

Summary:

With the recent raise of generative AI models such as GPT 4, Claude 3 or Mistral, tools are emerging to plug them into existing workflows and to create new workflows that they enable. This proposal brings forth the new Gen AI Framework Gem, which is meant to help O3DE developers to utilize modern AI in games and simulations.

This RFC has been developed as a follow up to https://github.com/o3de/sig-simulation/issues/86, including gathered feedback. It also dives deeper into technical solutions, focuses on immediate first steps (except for the clearly separated vision chapter)

The Gen AI Framework Gem is quite different from Machine Learning Gem in that it focuses on Generative AI instead on multi-layer perceptrons. Considering dynamic nomenclature in the space and versatility of the Gem though, there are considerations against including "Generative" in the name.

Vision

Before diving into specifics of first step, it is useful to underline the role and scope of Gen AI Framework Gem.

  1. It is a framework Gem, intended for developers to build on. As such, it does not, by itself, do anything useful that is user-facing. Other Gems will use it to build features powered by Generative AI.
  2. There is nothing hosting or vendor-specific in this Gem. Separate Gems are to use this Gem to connect to specific AI services or models.
  3. The scope of the Gen AI Framework Gem can be summarized as anything common to Gen AI use cases and vendor-specific AI service / model APIs, details being outlined in this RFC.

GenAIFramework

The Gen AI Framework Gem is meant for O3DE Gems developers, and it is expected to be a dependency of future gems such as AI characters, assistants, and scene generators. Unlike these future gems, its value does not strictly depend on current capabilities of AI models, as it is meant as a tool to explore their limits and is meant to be built in a flexible way, benefiting from improvements in these capabilities over years.

Generative AIs can be used through 3rd-party hosted services, such as Ollama, Amazon Bedrock or Open AI's GPT platform. These typically involve a pricing model per token, depending on model type and modality. There are proprietary models and open-source models. It is also possible to host models locally, including through tools such as Ollama or vLLM. AI services increasingly offer additional modalities (such as image prompting) as well as complex services, such as Assistants. The Gen AI Framework Gem is meant to be agnostic and supportive of all these common options.

Framework AI Gem is vendor-agnostic but includes tests and empowers testing of future gems that will utilize it. A new feature gem can also be delivered with a mocked AI vendor gem. This mocked vendor gem can produce predictable responses, that will give stable, cost effective and fast tests.

In the long run, from the perspective of game development, this Gem is a step to build smart characters that interact uniquely with the player, and assist in writing dialogue as well as creating 3D world.

From the perspective of simulation, it is a first step towards creating simulated robots, humans (in roles such as pedestrians, warehouse workers) and animals that behave in certain ways with less scripting, build smartly randomized simulation scenes and assist users in running validation scenarios as well as summarizing their results.

Long-term scope of the Gen AI Framework Gem

Given the new possibilities coming from recent advances in AI, game and simulation developers are looking to apply these new capabilities in their creations. Many steps that these developers need to take are common regardless of type of application, for example:

The Gen AI Framework Gem is intended to supply useful abstractions, interfaces and tooling for these steps.

What is the scope and relevance of this feature?

This feature is meant to capture the first version of Gen AI Framework Gem as a step towards features described in the Vision. The first version, this RFC, focuses on the following features:

With this first version complete, development of feature and vendor-specific Gems can be unlocked already.

Feature design description:

AI Context

AI Context is a new structure capturing identifying and necessary information about a specific instance of AI service. This includes an identifying key (which can be invalid, meaning the context is not valid), context entity id, and operational scope. AI Context is supplied when making API calls. Examples of intended future use:

AI calling O3DE interfaces

O3DE has Behavior Context to reflect classes, buses, methods, and properties. It captures essential information required to call these behaviors. Behavior Context is used by Python and Lua scripting code to expose and call O3DE interfaces. Since this is a well-traveled path for script execution, the Gen AI Framework gem will fully utilize Behavior Context in its implementation.

For the first version of the gem, the focus is on Editor functionality, and the use of existing Python-exposed Automation Scope API. In the future, we might want to limit parts of API available to AI service further than through the Scope flag. This can be implemented by creating a dedicated Behavior Context based on AI Context and some configurations.

This also means that the first version of Gen AI Framework Gem will depend on Editor Python Bindings gem. It is likely that this dependency will be removed in the future, once runtime use-case for the gem is implemented.

O3DE API available for calling needs to be communicated to the AI service, which necessitates producing text of methods and buses definitions. Code that does just that is already there in the Editor Python Bindings gem, in PythonLogSymbolsComponent.cpp file. Since there was previously no use for that outside of the bindings gem, the code that creates methods and ebuses definitions strings is right now lumped together with writing these strings to files. Three changes in Editor Python Bindings are proposed within this RFC:

The last change is needed because AI services need to have information about the purpose and expected outcome of API calls. Much of Automation scope API does is not reflected with documentation at the moment. A separate change PR will be introduced to add such documentation to BehaviorContext reflection for some APIs.

To execute code generated by AI automatically, code block will be extracted from the response and passed to the ScriptCall interface:

virtual bool ScriptCall(const AZStd::string& script, AZStd::string& response, const AIContext& aiContext) = 0;

Depending on the AI Context, a fitting Script Executor will be used. These executors handle script code execution as well as extracting and communicating back results, errors and logs. For the first version of the gem, an executor fitting for Editor and Python will be implemented, and it will call the Python Runner bus, i.e.:

AzToolsFramework::EditorPythonRunnerRequestBus::Broadcast(
            &AzToolsFramework::EditorPythonRunnerRequestBus::Events::ExecuteByString, script, true);

Features built on top of this gem will most likely define and implement new APIs for AI service to call. Note that adding a new API to O3DE requires nothing more than defining Behavior Context for it in the Reflect function, setting the scope flag to Automation, and making sure ToolTip attributes are set, for example:

    void CustomEditorTest::Reflect(AZ::ReflectContext* context)
    {
        if (auto behaviorContext = azrtti_cast<AZ::BehaviorContext*>(context))
        {
            behaviorContext->EBus<CustomEditorRequestBus>("CustomEditorRequestBus")
                ->Attribute(AZ::Script::Attributes::ToolTip, "Custom request bus documentation")
                ->Attribute(AZ::Script::Attributes::Scope, AZ::Script::Attributes::ScopeFlags::Automation)
                ->Attribute(AZ::Script::Attributes::Category, "AI")
                ->Attribute(AZ::Script::Attributes::Module, "test")
                ->Event(
                    "DoTheTestThing",
                    &CustomEditorRequests::DoTheTestThing,
                    { AZ::BehaviorParameterOverrides("testParameter", "documentation for test parameter") })
                ->Attribute(AZ::Script::Attributes::ToolTip, "Custom event documentation. Logs and returns its own parameter");
        }
  }

O3DE calling AI service

In this first version, the simplest interface will be available through a global bus:

void FetchGenAIGlobalEditorContext(AIContext& context);
AZStd::string GenAITextQuery(const AIContext& context, const AZStd::string& query);

Outside of tests, this will not be yet in use for the first version.

O3DE sharing data with AI

Initial prompt serves to communicate definitions and documentation necessary for the AI service to perform O3DE work. This includes:

AI service can further learn about the project with behavior buses and methods returning information such as about the scene, character pose and view port.

Communication and configuration of Generative AI services

The Gen AI Framework Gem supports developers by providing abstractions to implement vendor-specific components for communication. These components are divided into two categories:

Overview diagram

WIP_diagram

User Experience and Interaction

The first version does not focus on common UX, providing simple prompting input widget and another one for output, to monitor the response and errors.

Another UX element will be the possibility to rollback if AI executed code results are invalid (Undo batch).

Scripting language considerations

Python and Lua scripting is already supported in O3DE.

Python has the following advantages:

Lua has the following advantages:

It would be best to unify scripting interface for AI between Editor and Launcher, as it would simplify the design as well as better support Gen AI models working in both scopes, for example when developing new scenarios, changing scenes and testing them by running simulations). There are challenges in doing so.

Editor support for python is possible now because as a whole, python is included with the entire engine and editor environment. For anything that is needed for the game launcher, it needs to be able to export externally and work outside of the editor environment. Since Lua is part of the core libraries, it is available to any game launcher. In order to make python available to game launchers, a strategy needs to be designed to provide the python runtime to the launcher (most likely as some form of packaged python virtual environment).

In case of work to enable python in runtime (as an option at least), the EditorPythonBindings Gem would be renamed to just PythonBindings.

Security

We would like to limit the possibility of erroneous code affecting users negatively. We don't want to execute scripts without any limitations, for example:

import shutil
shutil.rmtree("~")

could remove the user directory.

In the first version, a simple proposed approach would be whitelisting all modules (imports) and stripping the generated code. Another approach to try is restricted python: https://restrictedpython.readthedocs.io/en/latest/#. There might be limitations on which platforms can support this.

Comments are welcome on both the first version solution and how to best solve the issue in the long run.

What are the advantages of the feature?

It introduces modern AI to O3DE Gem, making it attractive for development of games and simulations with generative AI.

What are the disadvantages of the feature?

In the future, it requires some changes to existing behavior context reflection, such as adding documentation, and further reaching changes to code for handling scripting, which might be overall impactful.

How will users learn this feature?

The Gem will be a part of canonical set, documented and cross-referenced in O3DE documentation. Publicity for the Gem is also planned, and showcase demo will be released in 2024. The gem will likely be presented alongside other AI gem(s), as it focused on core functionalities rather than user-facing features.

Are there any open questions?

Co-authors

@spham-amzn, @arturkamieniecki

adamdbrw commented 3 months ago

Approved