Added the togther api and tried to run but getting an error

hridulpk commented 2 months ago

so i did make an ipc_vector_db and added the together api key as an additional line as os.environ['TOGETHER_AI']="api_key"

but once i tried to run i get this error "RuntimeError: Error(s) in loading state_dict for NomicBertModel: Missing key(s) in state_dict: "pooler.dense.weight", "pooler.dense.bias"."

complete log: c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\langchain_core_api\deprecation.py:139: LangChainDeprecationWarning: The class HuggingFaceEmbeddings was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run pip install -U langchain-huggingface and import as from langchain_huggingface import HuggingFaceEmbeddings. warn_deprecated( C:\Users\jithi.cache\huggingface\modules\transformers_modules\nomic-ai\nomic-embed-text-v1\289f532e14dbbbd5a04753fa58739e9ba766f3c7\modeling_hf_nomic_bert.py:88: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = loader(resolved_archive_file) 2024-08-10 20:12:08.044 Uncaught app exception Traceback (most recent call last): File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 85, in exec_func_with_error_handling result = func() File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 576, in code_to_exec exec(code, module.dict) File "D:\blah\law gpt pro\LawGPT\app.py", line 52, in embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1",model_kwargs={"trust_remote_code":True,"revision":"289f532e14dbbbd5a04753fa58739e9ba766f3c7"}) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\langchain_core_api\deprecation.py", line 203, in warn_if_direct_instance return wrapped(self, *args, kwargs) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\langchain_community\embeddings\huggingface.py", line 79, in init self.client = sentence_transformers.SentenceTransformer( File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 287, in init modules = self._load_sbert_model( File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 1487, in _load_sbert_model module = Transformer(model_name_or_path, cache_dir=cache_folder, kwargs) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\sentence_transformers\models\Transformer.py", line 54, in init self._load_model(model_name_or_path, config, cache_dir, **model_args) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\sentence_transformers\models\Transformer.py", line 85, in _load_model self.auto_model = AutoModel.from_pretrained( File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\transformers\models\auto\auto_factory.py", line 559, in from_pretrained return model_class.from_pretrained( File "C:\Users\jithi.cache\huggingface\modules\transformers_modules\nomic-ai\nomic-embed-text-v1\289f532e14dbbbd5a04753fa58739e9ba766f3c7\modeling_hf_nomic_bert.py", line 356, in from_pretrained load_return = model.load_state_dict(state_dict, strict=True) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\torch\nn\modules\module.py", line 2215, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for NomicBertModel: Missing key(s) in state_dict: "pooler.dense.weight", "pooler.dense.bias".

please help

harshitv804 commented 2 months ago

Langchain frequently change their documentation. I would suggest you to refer their doc Here. I guess the problem here is the imports are deprecated, try replacing this line from langchain_huggingface.embeddings import HuggingFaceEmbeddings in https://github.com/harshitv804/LawGPT/blob/6e89b2915784a431975bb87aa974a78bcf6197ed/app.py#L2. Also replace this line db = FAISS.load_local("ipc_vector_db", embeddings,allow_dangerous_deserialization= True) in https://github.com/harshitv804/LawGPT/blob/6e89b2915784a431975bb87aa974a78bcf6197ed/app.py#L53

hridulpk commented 2 months ago

the change did make it runnable..but as you can see its sort of hallucinating and i think its still kind of throwing the same error

log: C:\Users\jithi.cache\huggingface\modules\transformers_modules\nomic-ai\nomic-embed-text-v1\289f532e14dbbbd5a04753fa58739e9ba766f3c7\modeling_hf_nomic_bert.py:88: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = loader(resolved_archive_file)

2024-08-10 20:59:52.282 Uncaught app exception Traceback (most recent call last): File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 85, in exec_func_with_error_handling result = func() File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 576, in code_to_exec exec(code, module.__dict__) File "D:\blah\law gpt pro\LawGPT\app.py", line 52, in embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1",model_kwargs={"trust_remote_code":True,"revision":"289f532e14dbbbd5a04753fa58739e9ba766f3c7"}) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\langchain_huggingface\embeddings\huggingface.py", line 61, in __init__ self.client = sentence_transformers.SentenceTransformer( File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 287, in __init__ modules = self._load_sbert_model( File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 1487, in _load_sbert_model module = Transformer(model_name_or_path, cache_dir=cache_folder, **kwargs) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\sentence_transformers\models\Transformer.py", line 54, in __init__ self._load_model(model_name_or_path, config, cache_dir, **model_args) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\sentence_transformers\models\Transformer.py", line 85, in _load_model self.auto_model = AutoModel.from_pretrained( File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\transformers\models\auto\auto_factory.py", line 559, in from_pretrained return model_class.from_pretrained( File "C:\Users\jithi\.cache\huggingface\modules\transformers_modules\nomic-ai\nomic-embed-text-v1\289f532e14dbbbd5a04753fa58739e9ba766f3c7\modeling_hf_nomic_bert.py", line 356, in from_pretrained load_return = model.load_state_dict(state_dict, strict=True) File "c:\users\jithi\appdata\local\programs\python\python39\lib\site-packages\torch\nn\modules\module.py", line 2215, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for NomicBertModel: Missing key(s) in state_dict: "pooler.dense.weight", "pooler.dense.bias". C:\Users\jithi\.cache\huggingface\modules\transformers_modules\nomic-ai\nomic-embed-text-v1\289f532e14dbbbd5a04753fa58739e9ba766f3c7\modeling_hf_nomic_bert.py:88: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = loader(resolved_archive_file) C:\Users\jithi\.cache\huggingface\modules\transformers_modules\nomic-ai\nomic-embed-text-v1\289f532e14dbbbd5a04753fa58739e9ba766f3c7\modeling_hf_nomic_bert.py:88: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = loader(resolved_archive_file) C:\Users\jithi\.cache\huggingface\modules\transformers_modules\nomic-ai\nomic-embed-text-v1\289f532e14dbbbd5a04753fa58739e9ba766f3c7\modeling_hf_nomic_bert.py:88: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = loader(resolved_archive_file)

harshitv804 commented 2 months ago

U can control those unwanted answers through a proper chat template. Ig there is some prob with the nomic model. U can try using other open source embedding models (search :MTEB Hugging Face). Or try with the latest nomic version. In code i have used a particular commit version of the model. Just remove that to download the latest model. Also make sure u use the same model for ingesting and retrieval.

hridulpk commented 2 months ago

ah let me try that...kind of noob in all this.. i will update

harshitv804 commented 2 months ago

To know about this project, these are the prerequisites:

Retrieval-Augmented Generation (RAG)
Prompt Engineering
Inferencing with Large Language Models (LLMs)
Streamlit
Langchain
Vector Embedding models & database

Hope this helps. Let me know if you face any probs. Thanks!

harshitv804 commented 2 months ago

Is it working?

hridulpk commented 2 months ago

yes it did work after changing the particular nomic version by removing the revision

embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1",model_kwargs={"trust_remote_code":True})

harshitv804 / LawGPT

Added the togther api and tried to run but getting an error #3