p3nGu1nZz / Tau

Tau LLM made with Unity 6 ML Agents
MIT License
11 stars 4 forks source link

Import `token_reduce.json` into Database and Support Variable Table Sizes #9

Closed p3nGu1nZz closed 1 month ago

p3nGu1nZz commented 1 month ago

Is your feature request related to a problem? Please describe. Currently, our database implementation only supports fixed table sizes (384 columns). We need to import token_reduce.json into the database, which requires supporting variable table sizes based on the number of PCA components.

Describe the solution you'd like Create a new function in our database code that:

Describe alternatives you've considered

Additional context This change is necessary to handle the reduced embeddings efficiently and flexibly. The new function should be able to:

Example structure of token_reduce.json:

{
    "token1": [0.1, 0.2, 0.3],
    "token2": [0.4, 0.5, 0.6],
    "token3": [0.7, 0.8, 0.9],
    ...
}
p3nGu1nZz commented 1 month ago

we completed a majority of this issue, however when we build the token table its not populating. I suspect this is due to the different table sizes of our embedding from 384 to 3. i think the table size is hardcoded in many places which needs investigating.

the _opti file is generating correctly with our pca sklearn algorithm.