Describe the bug
When we go to train, there can be inconsistencies between what's in the database and what's in the training file. Since we build the database independently of the training, we need a smarter way to handle missing embeddings. Currently, we get errors like this:
Exception: Failed to load training data from file 'data.json': 'Embedding for token '0.75' not found in the database.'
AgentTrainer.LoadTrainingData (System.String fileName) (at Assets/MLAgentsProject/Scripts/Training/Agent/AgentTrainer.cs:122)
AgentTrainer.Initialize () (at Assets/MLAgentsProject/Scripts/Training/Agent/AgentTrainer.cs:36)
BaseProcessor`2[TDelegator,TAgent].StartTrainingTask (TrainingPair`2[TDelegator,TAgent] pair, System.Threading.CancellationToken cancellationToken) (at Assets/MLAgentsProject/Scripts/Training/Processor/BaseProcessor.cs:37)
System.Runtime.CompilerServices.AsyncMethodBuilderCore+<>c.<ThrowAsync>b__7_0 (System.Object state) (at <321eb2db7c6d43ea8fc39b54eaca3452>:0)
UnityEngine.UnitySynchronizationContext+WorkRequest.Invoke () (at <adbae017f0374fce9921b97a33a4e8ca>:0)
UnityEngine.UnitySynchronizationContext.Exec () (at <adbae017f0374fce9921b97a33a4e8ca>:0)
UnityEngine.UnitySynchronizationContext.ExecuteTasks () (at <adbae017f0374fce9921b97a33a4e8ca>:0)
To Reproduce
Steps to reproduce the behavior:
Build the database independently of the training data.
Attempt to train using the AgentTrainer.
Observe the error when an embedding is missing.
Expected behavior
The AgentTrainer should gracefully stop the training routine if it encounters this error.
When this specific exception occurs, the system should call the generate embeddings task to create the missing embeddings and insert them.
If the same exception occurs twice in a row, the function should exit to prevent an infinite loop.
Screenshots
If applicable, add screenshots to help explain the problem.
Desktop (please complete the following information):
OS: Windows 11
GPU: RTX 3080
Unity Version: 6.0.21
Additional context
This issue was observed while working on the "Tau" project. Implementing a smarter way to handle missing embeddings and improving error handling will enhance the robustness of the training process.
Describe the bug When we go to train, there can be inconsistencies between what's in the database and what's in the training file. Since we build the database independently of the training, we need a smarter way to handle missing embeddings. Currently, we get errors like this:
To Reproduce Steps to reproduce the behavior:
AgentTrainer
.Expected behavior
AgentTrainer
should gracefully stop the training routine if it encounters this error.Screenshots If applicable, add screenshots to help explain the problem.
Desktop (please complete the following information):
Additional context This issue was observed while working on the "Tau" project. Implementing a smarter way to handle missing embeddings and improving error handling will enhance the robustness of the training process.