o3de / o3de

Open 3D Engine (O3DE) is an Apache 2.0-licensed multi-platform 3D engine that enables developers and content creators to build AAA games, cinema-quality 3D worlds, and high-fidelity simulations without any fees or commercial obligations.
https://o3de.org
Other
7.67k stars 2.19k forks source link

AssetProcessorBatch does not process all assets on the first pass if there are asset dependencies #17968

Open spham-amzn opened 4 months ago

spham-amzn commented 4 months ago

Describe the bug It appears that AssetProcessorBatch will only do a single pass on Assets and fail the ones that have dependencies that have not been processed yet. AssetProcessor will go back to these jobs and re-process them if the asset dependency in question is eventually processed, but AssetProcessorBatch will not. In order to get around this, you have to run AssetProcessorBatch twice.

Assets required Any project that has a reasonable number of assets. For this issue, the ROSCon2023Demo project was used.

Steps to reproduce Steps to reproduce the behavior:

  1. Follow the instructions to build the ROSCon2023Demo project (https://github.com/RobotecAI/ROSCon2023Demo/blob/development/README.md)
  2. From the command line, run AssetProcessorBatch. Notice the number of failed assets
  3. From the command line, run AssetProcessorBatch again. Notice the number of failed assets

Expected behavior The number of failed assets from the first run should be equal to the number of failed assets in the second run, which should be zero.

Actual behavior The number of failed assets (111) were all reprocessed successfully the second pass (0)

Screenshots/Video

Found in Branch development

Commit ID from o3de/o3de Repository 1da6cdb446

Desktop/Device (please complete the following information):

Additional context Build logs..

build1.log build2.log

tkothadev commented 3 months ago

I see that this was produced on Ubuntu. Did you find similar problems with the Windows build of AssetProcessorBatch?

spham-amzn commented 3 months ago

I see that this was produced on Ubuntu. Did you find similar problems with the Windows build of AssetProcessorBatch? No, the ROSCon2023 demo is linux only

AMZN-Gene commented 3 months ago

Also occurs on Windows Multiplayer Sample. Repro Steps 1) Delete cache folder 2) Run .\build\windows\bin\profile\AssetProcessorBatch.exe --platforms=pc --project-path=D:\prj\multiplayersample 3) Record number of failed assets 4) Rerun .\build\windows\bin\profile\AssetProcessorBatch.exe --platforms=pc --project-path=D:\prj\multiplayersample

Expected Results Same number of failed assets. Rerunning APBatch should not result in different outcome

Actual Result There are now fewer failed assets

tkothadev commented 3 months ago

I was able to reproduce Gene's error using Windows Multiplayer Sample. From what I saw, there were 30 failures (relating to both ShaderAssetBuilder and MaterialBuilder) on the first pass, and 15 failures on the second and third pass. This was repeated multiple times (via deleting the Cache folder to force APB to recalculate), and the behavior has been observed to be consistent.

I compared the 15 failures of the second pass with the 30 failures of the first pass. From that I determined that MPS has actual shader compilation failure (mostly in terms of legacy shaders, which I assume is due to compatibility changes in the shader model), so I excluded those results from this analysis. The failures that were only in the first pass I believe were exclusively material related issues.

However, I was not able to deduce a definite problem from inside the MaterialBuilder Component in the Atom Gem RPI Editor module, nor its CreateJob or ProcessJobs functions. I currently suspect the MaterialBuilder has dependency issues, but further debugging must be done in that area.

tkothadev commented 3 months ago

Attached are the logs I collected when running APB ap_batch_first_run.txt ap_batch_second_run.txt ap_batch_third_run.txt ap_batch_first_run_A.txt ap_batch_second_run_A.txt

tkothadev commented 3 months ago

These were the source files I was last investigating:

\Gems\Atom\RPI\Code\Source\RPI.Builders\Material\MaterialBuilder.cpp \Gems\Atom\Asset\Shader\Code\Source\Editor\ShaderAssetBuilder.cpp
nick-l-o3de commented 3 months ago

Looking at the log, it seems to be interplay between generated materialtype files, and materials that require them.

My assumption looking at the log is that a material file, in order to process, needs a materialtype file to have already been built so that it can load it.

I recommend checking that the material actually declares the materialtype file as a job dependency - for example, if it needs blah_generated.materialtype, it might need to declare a job dependency on ("blah.materialtype", "Material Type Builder (Pipeline Stage", "common") note that job dependencies are triples of (source file, builder job key, platform), to ensure that the material job gets queued after the material type job. In addition, it may be worth upping the priority of the material type generation job, so that it will tend to be higher in the queue than the materials that depend on it. This is the m_priority field of the job emitted by the builder.

Its not enough to just raise the priority as you might find a situation where there are for example 8 cores processing assets, and there are 4 materialtype jobs in the queue, and 4 material jobs (which depend on those jobs) int he queue. it will queue all of them simultaneously, if the jobs do not have dependencies described.

Note that the queue sort function sorts on criticality first, then priority, so its not a good idea to make a critical job depend on a non-critical one. You may have to escalate them all.

akioCL commented 3 months ago

The problem (at least on MultiplayerSample) was that some materials have the material type pointing to the intermediate asset folder and the Material Builder was trying to read the material type file in the CreateJobs function. This was failing the first time because the file didn't exist. On the second run it was able to read it because it was created when processing the material type file. Added support to find the original material type file before reading it in the CreateJobs function. PR https://github.com/o3de/o3de/pull/18038

lgleim commented 2 days ago

@akioCL Does https://github.com/o3de/o3de/pull/18038 resolve this issue then?