Describe the bug
When running the training agent tau {filename} command, we encounter errors due to missing embeddings in the database. This issue arises because the data.json file is out of sync with the database.bin generated from the data load {filename} command. Previously, we manually removed the problematic messages from data.json, but this is not a sustainable solution.
To Reproduce
Steps to reproduce the behavior:
Run the data load {filename} command to generate database.bin.
Execute the training agent tau {filename} command.
Observe the errors related to missing embeddings.
Expected behavior
The system should automatically handle missing embeddings by attempting to regenerate them for the missing token strings, rather than requiring manual edits to data.json.
Screenshots
N/A
Desktop (please complete the following information):
OS: Windows 11
Hardware: Nvidia RTX 3080 Laptop
Additional context
We have implemented prune and trim commands to clean the strings used in messages and token names in the database tables. However, the issue persists with missing embeddings during training. We propose catching the error and, if it is due to missing embeddings, attempting to repair the table by regenerating the embeddings for the missing token strings.
Describe the bug When running the
training agent tau {filename}
command, we encounter errors due to missing embeddings in the database. This issue arises because thedata.json
file is out of sync with thedatabase.bin
generated from thedata load {filename}
command. Previously, we manually removed the problematic messages fromdata.json
, but this is not a sustainable solution.To Reproduce Steps to reproduce the behavior:
data load {filename}
command to generatedatabase.bin
.training agent tau {filename}
command.Expected behavior The system should automatically handle missing embeddings by attempting to regenerate them for the missing token strings, rather than requiring manual edits to
data.json
.Screenshots N/A
Desktop (please complete the following information):
Additional context We have implemented prune and trim commands to clean the strings used in messages and token names in the database tables. However, the issue persists with missing embeddings during training. We propose catching the error and, if it is due to missing embeddings, attempting to repair the table by regenerating the embeddings for the missing token strings.