p3nGu1nZz / Tau

Tau LLM made with Unity 6 ML Agents
MIT License
11 stars 4 forks source link

Migrate `data load` Command into Multiple Commands for Improved Efficiency #15

Closed p3nGu1nZz closed 1 month ago

p3nGu1nZz commented 1 month ago

Describe the enhancement The current data load {filename} command takes approximately 4 hours to process a dataset of size 5k (around 12k total embeddings). To improve efficiency and manageability, we propose migrating this command into multiple commands.

Proposed Solution

  1. DataManager Class: Create a new singleton class, DataManager, to manage static memory for loaded data, similar to our other manager classes.
  2. Data Load Command: Modify the data load {filename} command to load the data file into the DataManager's static memory.
  3. Database Build Command: Introduce a new sub-command, database build {tablename}, which:
    • Checks for data loaded into the DataManager.
    • Builds each table independently, allowing users to save the database even if the tokens table fails.

To Reproduce Steps to reproduce the current behavior:

  1. Run the data load {filename} command with a dataset of size 5k.
  2. Observe that the command takes approximately 4 hours to complete.

Expected behavior

Screenshots If applicable, add screenshots to help explain the proposed enhancement.

Desktop (please complete the following information):

Additional context This enhancement is aimed at optimizing the data load process for the "Tau" project. By splitting the command into multiple steps and introducing a DataManager class, we can significantly reduce the time required for data loading and improve overall system reliability.