p3nGu1nZz / Tau

Tau LLM made with Unity 6 ML Agents
MIT License
11 stars 4 forks source link

Implement PCA Reduction Script and Batch Integration for Embeddings #7

Closed p3nGu1nZz closed 1 month ago

p3nGu1nZz commented 1 month ago

Is your feature request related to a problem? Please describe. Currently, we need a script to perform PCA on our embeddings and reduce their dimensionality. This will help in managing our high-dimensional vector space more efficiently.

Describe the solution you'd like A Python script named reduce.py that:

Additionally, we need to create a batch script to call reduce.py in the correct virtual environment, similar to how we handle encoder.py. The batch script should ensure the environment is set up correctly and the script is executed with the appropriate arguments.

Describe alternatives you've considered

Additional context See encoder.py for an example of how our Python scripts work. We call our Python scripts via a batch file like this for our encoder:

encoder.bat:

@echo off
call "%~dp0setenv.bat"
call "%ACTIVATE_SCRIPT%" >nul 2>&1
python "%~dp0encoder.py" %*

setenv.bat:

@echo off

set "PROJECT_ROOT=%~dp0\.."
set "TEMP_DIR=%PROJECT_ROOT%\.temp"
set "PYTHON_INSTALLER=python-3.10.11-amd64.exe"
set "PYTHON_DIR=%USERPROFILE%\.python\Python310"
set "VENV_NAME=ml-agents"
set "VENV_DIR=%PROJECT_ROOT%\venv\%VENV_NAME%"
set "ACTIVATE_SCRIPT=%~dp0activate.bat"
set "DEACTIVATE_SCRIPT=%~dp0deactivate.bat"
set "CLEAN_SCRIPT=%~dp0clean.bat"
set "UTILITIES_SCRIPT=%~dp0utilities.bat"
set "ML_AGENTS_DIR=%PROJECT_ROOT%\ml-agents"
set "ML_AGENTS_ENVS_INSTALL=%ML_AGENTS_DIR%\ml-agents-envs"
set "ML_AGENTS_INSTALL=%ML_AGENTS_DIR%\ml-agents"

activate.bat:

@echo off
call "%~dp0setenv.bat"

echo Activating virtual environment...
call "%VENV_DIR%\Scripts\activate.bat"

if %errorlevel% neq 0 (
    echo Virtual environment activation failed.
    exit /b 1
)

echo Virtual environment activated.

We also need to update our data load terminal command in our runtime. This will run after we extract the vocab but before we build the training and evaluation data tables.

p3nGu1nZz commented 1 month ago

this is now called Optimizer not reducer