RFC: Initial Project setup Asset Processor efficiency improvements

lemonade-dm commented 10 months ago

Summary:

When using O3DE with a new project, the first time the Asset Processor is launched, it runs jobs over all source assets located in the each active Gem scan folder, the entire project folder and the engines Assets directory.

Because a project tends to use many of the gems that come with the O3DE engine, there source assets for Gems are processed multiple times for each project. Furthermore the source assets that are part of the engine "Assets" directory are also processed once for each project at user has on their machine

It would be useful for all users of O3DE to be able to reduce or alleviate the time taken by the Asset Processor when launching a project for the first time.

What is the relevance of this feature?

This is important as there have been customer complaints about the amount of time it takes for first time startup when using O3DE.

The implementation of No-Code project RFC has drastically reduce the amount of time required to launch the engine applications from an SDK layout installed from the O3DE installer, but their is still a large time sync that is need for processing thousands of assets the first the time the Editor or the projects game launcher is launched.

Feature design description:

How the O3DE works in relation to using a project in the Editor or GameLauncher, requires that a set of critical assets are processed to before a user is allowed to interact (shaders, fonts, render passes, etc...) with the engine.

The Asset Processor is responsible for aggregating source assets provided by the active project, any Gems the project has active and the engine and process all them into an asset cache directory located at in the project root "Cache" folder on a per platform basis.

This processing can take upwards of 10s of minutes the first time the Editor is launched with the active project on a user machine before the user can use O3DE.

The following are several ideas to reduce the need to process source assets or to prevent duplicate processing of the same source assets.

Per Asset Scan Folder pre-populated cache

One idea is to allow the engine and Gems to provide a pre-populated cache of asset products, that can be re-used by the project without needing to process the Assets through the AP

Add a shared/common Asset "platform" for products

The processing of the large majority of source assets to products results in the same product output when processed for a specific OS platform. For example many JSON, XML and text format source files that are processed produces the same product output when the Asset Processing is producing a product for Windows versus a product for Android, Linux, MacOS, etc... Currently the Asset Processor would have a job for each platform that process.

Add an option to evaluate the priority of source assets associated with user selected level

This option can be helpful when a user want to load or launch a specific level in the Editor Game Launcher or to produce an Asset Bundle for the level. As the Asset Processor attempt to build all source assets that the Project, active Gems and the engine has available for use. There are times when several thousand Assets that never will be used are built on startup causing a large spike in CPU time

Audit and prune the list of Source Assets marked critical

There are many assets in code that are marked critical to the engine startup that are either not critical or could be replaced with a placeholder asset until it is processed. The Atom RPI uses many utility functions that forces a synchronous compile of the asset in the AP via it's TryToCompileAsset API. A lot of the Atom Shaders and Streaming Images are marked as critical and auditing whether some of those assets are truly needed on startup can be done.

Improve algorithm for determining how many builder jobs the Asset Processor should kick off at once.

By default the number of builder processes the Asset Processor connects to in the background is controlled by the minJobs and maxJobs setting in the AssetProcessorPlatformConfig.setreg file. By default the Asset processor launches builder processes up to the "logical core count - 1". There are several problems with that approach. The first of which is that it doesn't take into account the ratio of RAM to core count. A machine with 8 cores and 8 GiB of ram would launch 7 AssetBuilder jobs which can take anywhere from ~512MiB-4 GiB of RAM for processing depending on the type of source asset. Textures and FBX processing uses more RAM, than prefab and XML processing. This leads to scenarios where the AP is using up CPU of a machine, while potentially using Swap memory to builders to still run even when RAM is exhausted. The Asset Processor doesn't constantly take into account current available memory usage/core usage at the time. If a process such as the O3DE Editor or a Web Browser is using percentage of the total memory, it doesn't account for the remaining available memory when jobs. A machine might have 8 cores and 16 GiB of Ram, but half of the RAM is used by other processes not available to the Asset Processor at the time.

It may be useful to provide configuration settings to user to control how many Asset Builder processes are launched as well as separately how many Job are run at once, based on user provided heuristic settings. Settings such as expected max amount of RAM used per job would be useful(such as ExpectedMaxRamPerJob or ExpectedAverageRamPerJob. Also a setting that allows a user to provide a cap on total number jobs created based on a maximum RAM threshold or available RAM could be useful. All of this could help prevent thrashing while running the Asset Processor and improve machine responsiveness.

nick-l-o3de commented 10 months ago

FWIW, on linux, I did some profiling of startup time of AP when the cache is actually already populated, which is relevant to this, and found that a lot of the time (25s!) went into the building of the catalog (assetcatalog.xml) on initial startup and connection, and most of that time (99%+) was purely finding the correct case of files during startup, so a ton of file queries, hashing, and so on. We could probably improve that drastically since it performs a startup scan already (takes about 2 seconds) which already caches the actual true file paths and cases of everything.

For future sake, the profiler showed

Building the catalog -->
   Invokes a database query for all known sources/products/etc so it can make the catalog -->
      Creates an AssetID for the stuff from the database -->
          Creates a Source UUID for the stuff from the database -->
              Is based on filename, and is a case-sensitive hash, so it calls UpdateToCorrectCase (99% of the time spent)

AMZN-alexpete commented 10 months ago

We should create a separate RFC for automatically collecting and storing the data about AP processing times for common workflows, like first time asset processing, and stress tests if we don't already have automation and public storage for developers to look at - possibly in a GitHub repo for a static website that stores and displays statistics for O3DE.

o3de / sig-core