This repo contains the code described in LLMR, implementing the Large Language Model for Mixed Reality framework.
This package serves as a prototype for "Speaking the world into existence", which allows the real time creation of objects, tools, and scenes with visual, behavioral, and interactive elements through natural language. Our framework combines prompt-based generation with Unity to enable spontaneous user creation at run time, a core element of VR since its inception. The package comes with several demo scenes for you to experiment with: an empty playground in which you can create objects in isolation and scenes that engage Dall-E and CLIP to find 3D models that are visually and textually similar to your prompt.
Example use cases:
The project has been tested on Unity versions 2021.3.25f1 and 2022.3.11f1 and has several dependencies. The largest of these is the Roslyn C# compiler. For this project we used the implementation by Trivial Interactive, which you have to purchase on the Unity Asset store C# Compiler. This project could be adapted to depend on the open-source implementation of C#, but would require further implementation of attaching compiled code to GameObjects that this implementation comes with. Thus, to get this project to function, you need to add the compiler to it.
Once the compiler has been added, the project should be ready as is. Note that the the run-time compilation of code generated by LLMs is dependent on preloaded assembly DLLs, which we attached in the Assemblies folder. To get these to register in the app, you need to create Assembly Reference Assets with the help of the Trivial Interactive package.
If you would like to add this functionality to an existing project, you will need to export all of the elements as a package (including the compiler) and follow these steps:
"com.siccity.gltfutility": "https://github.com/siccity/gltfutility.git",
"com.openai.unity": "4.3.0",
and append them under the dependencies field in the file with the same name under the Packages folder in your Unity project root directory.
Also copy over lines 48-59
,
"scopedRegistries": [
{
"name": "OpenUPM",
"url": "https://package.openupm.com",
"scopes": [
"com.openai",
"com.utilities",
"com.atteneder"
]
}
]
and append them as a new field after dependencies.
You will need to have Python 3.9 or higher in your system. You can use python or conda environments to install the requirements.
After you clone the repo, create a python environment inside the local folder:
cd CLIP-DallE-SketchFab-001
python -m venv .venv
source .venv/bin/activate
Then, install the list of requirements:
pip install -r requirements.txt
You can now run the flask app:
flask run
In your terminal, you should see something like the image below. The app is now running on your local server and you can go back to Unity and enable the use of this app.
If you encounter any import or dependency issues, you can run python app.py
to see the error messages. Install any missing dependencies as needed.
Playground_new
Processing finished. Please enter the next request.
will be displayed in the input field when it is done. You can inspect the results and make further requests if desired.Remarks:
Playground_DallE-CLIP_Refinement
. It contains the necessary components to use Dall-E and CLIP to find the closest Sketchfab model to your request.However, before you can use this functionality, you need to clone a separate repo to your machine and run a Python Flask app. Ask us for GitHub access and follow the next step: "Set Up for CLIP-DallE-SketchFab-001 Flask App".
The Flask app can also help you generate entire scenes, such as "farm with 2 cows and a horse" and engage a GPT-depth inferencer. A future release should contain a scene that employs this ability.
First clone the repo for the flask app or unzip the folder "CLIP-DallE-SketchFab-001-main.zip."
git clone https://github.com/delamarifer/CLIP-DallE-SketchFab-001.git
This flask app integrates several AI models to create and search for 3D models based on natural language prompts. It can be used inside Unity to generate scenes or objects from text descriptions. The app provides the following functionalities:
/get_scene_from_prompt
: Dall-E takes a prompt for a scene with multiple objects and generates an image of the scene with the indicated objects. Unity can use this image to determine the placements of SketchFab or primitive models in the scene./get_image_from_prompt
: Dall-E takes a prompt for a single object and generates an image of a 3D version of it. This image becomes the "Target" image that the app will try to match with a SketchFab model./get_closest_skfb_model
: The app queries the SketchFab API to download N different models from their free collection. Using CLIP, the app selects the N-subset models with the closest text descriptions to the prompt. From this subset, the app compares the models visually to the Target image and returns the SketchFab UID of the closest match. Unity can use this UID to download and display the correct SketchFab model.The demo_use_app.py
file demonstrates how to use all three functionalities.
The required namespace xxx cannot be found. Are you missing an assembly reference?
. Please reach out if this happens!This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Please see: Data Privacy Notice