Open avineet123 opened 11 months ago
In general sending large tables to LLMs is not possible/feasible due to the context window. However, a strategy which you can and should use is, to use a vector store to store your columns including a short description for each column (you store embeddings thereof). Now you first retrieve the potentially helpful columns and send them to the LLM - only sending a subset of the huge table to the LLM.
This then also transfers to training: You don't train the LLM on the full table, but only on a subset.
While this is somewhat cumbersome, there is currently no other way, unfortunately.
I am trying to finetune large tables having 99 columns and 180 rows for complex sql queries. I am unable to finetune it as it has 6000 tokens. Can we do that using LLAMA2?. Please assist.