o3de / sig-simulation

Special Interest Group for Simulation
Apache License 2.0
9 stars 13 forks source link

Proposed RFC Feature: AI Core Gem #86

Closed adamdbrw closed 4 months ago

adamdbrw commented 5 months ago

Summary:

With the recent raise of generative AI models such as GPT 4, tools are emerging to plug them into existing workflows and to create new workflows that they enable. This proposal brings forth the new AI Core Gem, which is meant to help O3DE developers to utilize modern AI in games and simulations.

What is the relevance of this feature?

Given the new possibilities coming from recent advances in AI, game and simulation developers are looking to apply these new capabilities in their creations. Many steps that these developers need to take are common regardless of type of application, for example:

The AI Core Gem is quite different from Machine Learning Gem in that it focuses on Generative AI instead on multi-layer perceptrons. Considering dynamic nomenclature in the space and versatility of the Gem though, there are considerations against including "Generative" in the name.

The AI Core Gem is meant for O3DE Gems developers, and it is expected to be a dependency of future gems such as AI characters, assistants, and scene generators. Unlike these future gems, The AI Core Gem value does not strictly depend on current capabilities of AI models, as it is meant as a tool to explore their limits and is meant to be built in a flexible way, benefiting from improvements in these capabilities over years.

In the long run, from the perspective of game development, this feature can help to build smart characters that interact uniquely with the player, and assist in writing dialogue as well as creating 3D world.

From the perspective of simulation, it can help to create robots, humans (in roles such as pedestrians, warehouse workers) and animals that behave in certain ways with less scripting, build smartly randomized simulation scenes and assist users in running validation scenarios as well as summarizing their results.

Feature design description:

Connectivity and communication with Generative AI services

Generative AIs can be used through 3rd-party hosted services, such as Amazon Bedrock or Open AI's GPT platform. These typically involve a pricing model per token, depending on model type and modality. There are proprietary models and open-source models. It is also possible to host models locally, including through tools such as Ollama or vLLM.

AI services increasingly offer additional modalities (such as image prompting) as well as complex services, such as Assistants.

Since the pace of development and emerging of new APIs is rapid, it is important for the AI Core Gem to be flexible and extendable in its implementation of connectivity and communication. As such, the approach is to be:

The communication layer will be abstracted, allowing to support local-network run models as well as local-device GPU in the future, as well as streaming connections such as websockets. With the first release, the feature set will rely on HttpRequestor gem for communication.

Global settings

While it makes sense to picture a use-case where more than one vendor's AI is used within one project, as they can easily have different strengths, as a first step it is good to start simple and have one global setting for the AI features, much like Physics Settings.

These global settings will include URI and other connectivity settings such as for authorization, usage limits, default models for each modality, and user preference settings for things like visualization.

The first version will only include URI and default model selection.

Global settings will be accessible through Editor menu and through registry key settings.

AI calling O3DE interfaces

Core value of this gem is to allow AI to perform work through O3DE interfaces. Examples include:

These APIs will be exposed through a kind of reflection mechanism, but full documentation needs to be also shared, either as URIs or initial prompt.

O3DE sharing data with AI

To interact with O3DE in an informed way, AI will require inputs such as:

Runtime interaction with characters and environment might be specific to application, for example simulations are likely to expose ROS interfaces. For the first implementation, sharing list of assets, generic text prompts and callable methods is enough.

Future extensions

The gem can be extended with a voice interface, allowing to prompt and give tasks to AI by simply speaking. Note that text-to-speech is a part of some vendor APIs already, so that AI can speak back.

RFCs for new AI feature gems will follow when this gem is implemented.

Technical design description:

Challenges of AI RPC interface

To rely O3DE APIs to AI service, we need to supply information of functions' signatures, document their purpose, semantics, parameters and return values. The immediate issue is that the documentation is typically provided as code comments and itself unavailable at runtime. Such documentation is also not provided through the current behavior reflection system.

Possible solutions include:

There are various drawbacks to each of these solutions, such as ensuring good workflow when custom gems are involved, including proprietary ones, exposing code base headers to 3rd party (potentially licensing issues), blast radius of changes in O3DE, avoiding noise for AI such as APIs which are not accessible, not whitelisted or irrelevant, the issue of essentially copying information with custom documentation approach. There is also an issue of being able to register interfaces and assign callbacks dynamically. Another consideration is that AI might benefit from a custom, iterative approach to return values and how much feedback it needs to function at optimal performance, which can differ significantly from how current APIs are constructed.

One considered approach is to use and possibly expand on behavior reflection system in O3DE, either through providing a way to generate AI-suitable reflection, including serializing to JSON or similar format, at least for selected category, or by creating another layer in the reflection system.

The other approach considered approach is a custom API registration system which supplies function name and signature, its documentation and callback. Custom approach can help to simplify types, dependencies and amount of context AI needs to succeed.

Community comments on RPC design are especially welcome.

AI -> O3DE interface

The API registration mechanism needs to be a part of AI Core Gem developer's interface, so that custom gems and their components can register new ways of interaction.

The AI Core Gem RPC System Component will rely static RPC (constructed through Reflection mechanism) to the AI service, and allow for dynamic attachment of callbacks to existing registry entries (otherwise callbacks are considered empty, which should cause warnings). It will also allow to dynamically add API entries (including the callback). The method of relying such API description to the AI service may be implementation-dependent, by default using text prompts, but some implementations might instead produce a file with static API description and upload it to AI Assistant-like service.

The AI service will be instructed (by internally-captured, configurable prompts) to call APIs within a specified text block, making parsing of its response straightforward. Most likely, JSON format will be used to structure the API calls in text, which is a common approach in different contexts, see libjson-rpc-cpp library as an example.

O3DE -> AI interface

Providing data to AI service will be implementation dependent. AI Core Gem will include text prompting, but might include image prompting once available in popular open source models. Other modalities might be included if standardized and implemented by most of vendors. Until then, modalities other than text will be left to vendor-specific gems.

In the context of robotics, modalities other than text will be especially important, for example images from robot camera sensors, or audio commands from its human co-workers.

What are the advantages of the feature?

Once this gem is released, O3DE developers will be empowered to develop AI-based features based on this gem. This will bring users looking to explore and develop AI applications for academic or industrial use-cases to O3DE.

What are the disadvantages of the feature?

Given that the AI space is extremely dynamic, this gem needs to be supported and updated continuously. It needs to stay relevant as the space expands and AI-empowered tools become commonplace.

There is also a considerable effort to decide which interfaces to expose and understand what is possible with the technology.

Are there any alternatives to this feature?

The main alternative is treat AI as set of external tools and to focus on developing rich API for O3DE to interact with these, as opposed to integrated approach that this proposal describes.

While integrated approach involves writing some extra wrapper code and developing O3DE-side UI/UX, the advantages lie in tailored approach to collaborative content creation and ability to work better with Editor workflows. These are main reason for preference for the integrated approach.

Another alternative is not to have the AI Core Gem, but instead one gem per vendor, including open source. However, this has disadvantages of repeating the common part and it doesn't help to have the same UX for AI users in O3DE, where a common use-case will be to compare performance of AI from several vendors.

How will users learn this feature?

The Gem will be a part of canonical set, documented and cross-referenced in O3DE documentation. Publicity for the Gem is also planned, and showcase demo will be released in 2024. The gem will likely be presented alongside other AI gem(s), as it focused on core functionalities rather than user-facing features.

Are there any open questions?

adamdbrw commented 5 months ago

Comments expected by Feb 23rd.

adamdbrw commented 5 months ago

Figuring out the name is one of requirements. Ideas include:

byrcolin commented 5 months ago

I like the idea. This should not be a core gem as core gems should be what is minimally needed to get the engine to compile and run. This should be in its own repo or part of extras.

kberg0 commented 5 months ago

LLM Gem seems more appropriate if you're going to stick to transformer based model integration. GenAI Gem seems appropriate if you have plans to integrate diffusion models and other model architectures, which might be really neat from an 'automatically generate textures and video clips' standpoint.

As mentioned in the TSC meeting; in terms of binding, having something dump out the available interfaces by iterating, say, all our script canvas nodes and plopping those on the command-prompt (with an appropriately large context window) would probably go a long way towards improving the usefulness and accuracy of the integration. Even if we fine-tuned a model, which I would really recommend, using some sort of RAG-like approach would really boost performance.

Sticking to json/xml based dumps of API's and scenegraphs, and then consuming LLM output as prefabs and script-canvas graphs seems promising. That approach would then scale to any other Gems the user happens to have installed, including the ROS2 Gems or even the Machine Learning Gem which currently offers a limited set of script canvas nodes. There's a lot of UI work ahead to make this well integrated from an end-user perspective, but that can luckily all be decoupled from this initial RFC.

I definitely love the idea! I'll be keeping an eye on this.

nick-l-o3de commented 5 months ago

I'll dump the notes from discussions about this that happened live in the TSC, before I add my own

nick-l-o3de commented 5 months ago

reading the RFC, my only concern is that it becomes way to broad initially . I understand here we want to have a vision, and thats fine for RFC level. I imagine it would be good once the RFC is generally accepted to offer a small technical breakdown of what APIs would be in v1 (with one example gem that uses the APIs functionally) and then what would be in V2 (with an additional different usecase) proving that the APIs in v1 will not have to be disrupted and reworked, only extended with V2. V2 does not have to then actually be developed, just V1, until someone wants to add V2 or some other library, it would at least prove it out.

This is a good place to mention that O3DE does already have support and examples of so-called Framework gems - that is, a gem which provides APIs and busses and functionality that is only useful when a different gem uses it/depends on it. likely the actual technical structure of this would be such a framework gem, with at least one (although it could have as many as you want) .API modules that other gems depends on. The API modules could be kept nice and lightweight, header only or very nearly header only, to avoid dll bloat.

As for ways to expose the engine to AI, there's basically a number of ways, but the official somewhat well-traveled route is through the behavior context, since its job is literally to offer the functionality of the tools and engine in a neutral way that can be exercised by whatever, including new languages or interfaces. Its a complex path, but it is well-traveled since there are already examples of mining the behavior context for python, for lua, for script canvas, and its already had someone adapt it for javascript (without releasing it, but at least it proves that its flexible enough to bind to whatever you want to bind it to).

Directly generating things like prefabs may be okay too - but it depends on how good LLMs are at generating actual viable json with a bunch of tricky rules, without hallucinating things that come from other projects or similar situations. my experiments in this realm have not currently been very positive, with it working sometimes, but sometimes starting to spit out unity or unreal formatted documents or just imagining types and apis that simply don't exist at all.

adamdbrw commented 5 months ago

Based on comments above, would GenAIFramework be a fitting name for the Gem?

Huawei-CarlosCarbone commented 5 months ago

There is quite a few points that I would like to highlight

I know that most of these are also big milestones that cannot be implemented on the first demo or in the near future, but missing this perspective and not sharing it might make difficult for others to understand what is the intended end use of this gem, in other words this means showing the difference between "this is how we expect the user to use this gem" vs "this gem includes feature x, y and z", which translates to "this is an electric guitar and you can make rock music like this with it" vs "this is a electric guitar and it can produce distorted sounds"). I would not have highlighted all of these but since I got the two feedbacks of "this is a general implementation" BUT "we want to generate worlds in the firs demo" it sounds that at least one specific usage from the developer is expected with this gem.

If anything is unclear or you would like to discuss additional feedback do not hesitate to let me know

adamdbrw commented 4 months ago

Based on all the feedback, I will take the following steps: