Open Curiosity007 opened 1 year ago
Pandas and Pandas AI, might be the solution for this.
On Fri, May 19, 2023, 12:35 Curiosity007 @.***> wrote:
.env Generic
TEXT_EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2 TEXT_EMBEDDINGS_MODEL_TYPE=HF # LlamaCpp or HF USE_MLOCK=false Ingestion
PERSIST_DIRECTORY=db DOCUMENTS_DIRECTORY=source_documents INGEST_CHUNK_SIZE=500 INGEST_CHUNK_OVERLAP=50 INGEST_N_THREADS=1 Generation
MODEL_TYPE=LlamaCpp # GPT4All or LlamaCpp MODEL_PATH=eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin MODEL_TEMP=0.8 MODEL_N_CTX=2048 # Max total size of prompt+answer MODEL_MAX_TOKENS=1024 # Max size of answer MODEL_STOP=[STOP] CHAIN_TYPE=betterstuff N_RETRIEVE_DOCUMENTS=100 # How many documents to retrieve from the db N_FORWARD_DOCUMENTS=100 # How many documents to forward to the LLM, chosen among those retrieved N_GPU_LAYERS=32 Python version
Python 3.10.10 System
Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy CASALIOY version
Latest Commit - ee9a4e5 https://github.com/su77ungr/CASALIOY/commit/ee9a4e5cd9bcff90adf9078d4acc0a634750a011 Information
- The official example scripts
- My own modified scripts
Related Components
- Document ingestion
- GUI
- Prompt answering
Reproduction
I have fed the system a 5000 line csv file, with 30 columns.
Now I asked about overall insight from the data.
I can see in the terminal, it is only seeing top 5 or 7 documents, which is nothing but single row. So, this is giving me answer based on 5 or 7 rows, and thus no actual insight is coming Expected behavior
Should be able to understand the pattern in the data, and suggest some insights based on it.
— Reply to this email directly, view it on GitHub https://github.com/su77ungr/CASALIOY/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZ6BAN6V3AP427TLSPHCKTXG6HKRANCNFSM6AAAAAAYH53MNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
.env
Generic
TEXT_EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2 TEXT_EMBEDDINGS_MODEL_TYPE=HF # LlamaCpp or HF USE_MLOCK=false
Ingestion
PERSIST_DIRECTORY=db DOCUMENTS_DIRECTORY=source_documents INGEST_CHUNK_SIZE=500 INGEST_CHUNK_OVERLAP=50 INGEST_N_THREADS=1
Generation
MODEL_TYPE=LlamaCpp # GPT4All or LlamaCpp MODEL_PATH=eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin MODEL_TEMP=0.8 MODEL_N_CTX=2048 # Max total size of prompt+answer MODEL_MAX_TOKENS=1024 # Max size of answer MODEL_STOP=[STOP] CHAIN_TYPE=betterstuff N_RETRIEVE_DOCUMENTS=100 # How many documents to retrieve from the db N_FORWARD_DOCUMENTS=100 # How many documents to forward to the LLM, chosen among those retrieved N_GPU_LAYERS=32
Python version
Python 3.10.10
System
Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy
CASALIOY version
Latest Commit - ee9a4e5
Information
Related Components
Reproduction
I have fed the system a 5000 line csv file, with 30 columns.
Now I asked about overall insight from the data.
I can see in the terminal, it is only seeing top 5 or 7 documents, which is nothing but single row. So, this is giving me answer based on 5 or 7 rows, and thus no actual insight is coming
Point to be noted - I have kept only 1 document in the source documents folder to avoid information overlapping
Expected behavior
Should be able to understand the pattern in the data, and suggest some insights based on it.