sudarshan-koirala / llama2-chat-with-documents

Simple Chainlit app to have interaction with your documents.
MIT License
49 stars 38 forks source link

Number of tokens (757) exceeded maximum context length (512). #2

Open datacrud8 opened 1 year ago

datacrud8 commented 1 year ago

hi, trying to build this app in local, and used same model llama-2-7b-chat.ggmlv3.q8_0.bin when run the app UI showing some random message same like you showed but checking in console getting this below message:

Number of tokens (755) exceeded maximum context length (512). Number of tokens (756) exceeded maximum context length (512). Number of tokens (757) exceeded maximum context length (512).

so increased max_new_tokens=2048, and increased n_ctx and added truncate=True , non of them are fixing this issue. Changed the model as well. still same issue.

do you know any solution for this issue?

ctxwing commented 1 year ago

i got the just same as @datacrud8 .

did any one got solved ? thanks in advance.

$ chainlit run main.py -w 2023-10-25 19:38:13 - Loaded .env file 2023-10-25 19:38:22 - Your app is available at http://localhost:8000 2023-10-25 19:38:51 - Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2 2023-10-25 19:38:54 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information. 2023-10-25 19:39:06 - Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2 2023-10-25 19:39:07 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information. Batches: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.15s/it] 2023-10-25 19:39:20 - 4 changes detected Batches: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6.11it/s] 2023-10-25 19:41:53 - Number of tokens (513) exceeded maximum context length (512). 2023-10-25 19:41:53 - Number of tokens (514) exceeded maximum context length (512). 2023-10-25 19:41:54 - Number of tokens (515) exceeded maximum context length (512). 2023-10-25 19:41:54 - Number of tokens (516) exceeded maximum context length (512). 2023-10-25 19:41:55 - Number of tokens (517) exceeded maximum context length (512). 2023-10-25 19:41:55 - Number of tokens (518) exceeded maximum context length (512).

sudarshan-koirala commented 1 year ago

hello, can you try a different embeddings model, for example, hkunlp/instructor-large in the ingest.py file.

ctxwing commented 1 year ago

@sudarshan-koirala At first, thanks for the answer to my question. i changed this model_name="sentence-transformers/all-MiniLM-L6-v2" ... result is case [A]

into below lists including yours recommend, from by refering (https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models ) .

    huggingface_embeddings = HuggingFaceEmbeddings(
        model_name="hkunlp/instructor-large", #<-- [B] changed to this , throw error below
        model_kwargs={"device": "cpu"},
        )

$ time python ingest.py Downloading (…)c7233/.gitattributes: 100%|████████████████████████████████████████████████| 1.48k/1.48k [00:00<00:00, 3.60MB/s] Downloading (…)_Pooling/config.json: 100%|█████████████████████████████████████████████████████| 270/270 [00:00<00:00, 716kB/s] Downloading (…)/2_Dense/config.json: 100%|█████████████████████████████████████████████████████| 116/116 [00:00<00:00, 291kB/s] Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████████████| 3.15M/3.15M [00:00<00:00, 11.1MB/s] Downloading (…)9fb15c7233/README.md: 100%|█████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 338kB/s] Downloading (…)b15c7233/config.json: 100%|████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 4.31MB/s] Downloading (…)ce_transformers.json: 100%|█████████████████████████████████████████████████████| 122/122 [00:00<00:00, 358kB/s] Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████████████| 1.34G/1.34G [01:56<00:00, 11.5MB/s] Downloading (…)nce_bert_config.json: 100%|███████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 157kB/s] Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 6.51MB/s] Downloading spiece.model: 100%|█████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 11.9MB/s] Downloading (…)c7233/tokenizer.json: 100%|████████████████████████████████████████████████| 2.42M/2.42M [00:01<00:00, 2.36MB/s] Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 7.06MB/s] Downloading (…)15c7233/modules.json: 100%|████████████████████████████████████████████████████| 461/461 [00:00<00:00, 1.37MB/s] Traceback (most recent call last): File "/home/ctxwing/docker-ctx/lancer/basic/llama2-chat-with-documents/ingest.py", line 82, in create_vector_database() File "/home/ctxwing/docker-ctx/lancer/basic/llama2-chat-with-documents/ingest.py", line 59, in create_vector_database huggingface_embeddings = HuggingFaceEmbeddings( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/langchain/embeddings/huggingface.py", line 66, in init self.client = sentence_transformers.SentenceTransformer( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 95, in init modules = self._load_sbert_model(model_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 840, in _load_sbert_model module = module_class.load(os.path.join(model_path, module_config['path'])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/models/Pooling.py", line 120, in load return Pooling(**config) ^^^^^^^^^^^^^^^^^ TypeError: Pooling.init() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

real 2m12.113s user 0m14.215s sys 0m9.627s

sny-verma commented 6 months ago

As per new updates, define like this:-

llm = CTransformers( model=model_path, model_type=model_type, config={'max_new_tokens': 1024, 'temperature': 0.7, 'context_length': 4096} )