langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.58k stars 15.3k forks source link

data_manager not working with cache init for chatOpenAI #10731

Closed Coditas-Vikas-Tiwari closed 10 months ago

Coditas-Vikas-Tiwari commented 1 year ago

System Info

langchain-0.0.292, python-3, vsCode

Who can help?

@hwchase17 @agola11

Information

Related Components

Reproduction

from langchain.schema import HumanMessage, SystemMessage from langchain.chat_models import ChatOpenAI import os from dotenv import load_dotenv

from gptcache import cache from gptcache.manager import get_data_manager,manager_factory, CacheBase, VectorBase from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation from gptcache.embedding import Onnx from gptcache.processor.pre import get_messages_last_content from langchain.chat_models import ChatOpenAI from gptcache.adapter.langchain_models import LangChainChat import time

print("Cache loading.....") onnx = Onnx()

cache_base = CacheBase('mysql', sql_url='your_mysql_url') vector_base = VectorBase('faiss', dimension=128) data_manager = get_data_manager(cache_base, vector_base) cache.init(pre_embedding_func=get_messages_last_content, data_manager=data_manager, ) cache.set_openai_key() load_dotenv() def generate_res(user_input): chat = LangChainChat(chat=ChatOpenAI(model_name=os.environ.get('GPT_MODEL'), temperature=0.7, openai_api_key=os.environ.get('OPEN_AI_API_KEY'))) prompt="" start_time = time.time()

message = [ SystemMessage(content=prompt), HumanMessage(content=user_input), ] response = chat(message) generate_res("hey there!") print("Time consuming: {:.2f}s".format(time.time() - start_time))

Expected behavior

I'm trying to store the cache into mysql db using chatopenai and gptcache. but getting this error:

super().init(*kwargs) File "pydantic\main.py", line 339, in pydantic.main.BaseModel.init File "pydantic\main.py", line 1066, in pydantic.main.validate_model File "pydantic\fields.py", line 439, in pydantic.fields.ModelField.get_default File "pydantic\utils.py", line 693, in pydantic.utils.smart_deepcopy File "C:\Program Files\Python311\Lib\copy.py", line 172, in deepcopy y = _reconstruct(x, memo, rv) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 271, in _reconstruct state = deepcopy(state, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 146, in deepcopy y = copier(x, memo) ^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 172, in deepcopy y = _reconstruct(x, memo, rv) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 271, in _reconstruct state = deepcopy(state, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 146, in deepcopy y = copier(x, memo) ^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 172, in deepcopy y = _reconstruct(x, memo, rv) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 271, in _reconstruct state = deepcopy(state, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 146, in deepcopy y = copier(x, memo) ^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 172, in deepcopy y = _reconstruct(x, memo, rv) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 271, in _reconstruct state = deepcopy(state, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 146, in deepcopy y = copier(x, memo) ^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 172, in deepcopy y = _reconstruct(x, memo, rv) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 271, in _reconstruct state = deepcopy(state, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 146, in deepcopy y = copier(x, memo) ^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 271, in _reconstruct state = deepcopy(state, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 146, in deepcopy y = copier(x, memo) ^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\copy.py", line 161, in deepcopy rv = reductor(4) ^^^^^^^^^^^ TypeError: cannot pickle 'module' object

dosubot[bot] commented 1 year ago

🤖

Hello,

The error you're encountering, "TypeError: cannot pickle 'module' object", typically occurs when you're trying to serialize (pickle) an object that isn't serializable. In your case, it seems like you're trying to store the cache into a MySQL database using ChatOpenAI and gptcache.

In LangChain, the framework uses json.dumps() and json.loads() for serialization and deserialization of the Generation objects. However, if the Generation object or any of its attributes are not serializable, this could lead to the TypeError you're experiencing.

A general solution to this problem would be to ensure that the objects you're trying to store in the cache are serializable. If you're trying to store a Python module or a function (which are not serializable) in the cache, you'll need to refactor your code to avoid this.

If you're using a custom object, you might need to implement the __getstate__ and __setstate__ methods in your class to provide a way for Python's pickle module to serialize and deserialize the object.

In your case, the update method in the cache.py file is storing the return_val object in the cache after serializing it with _dumps_generations(return_val). If return_val or any of its attributes are not serializable, this could be the source of the error. You should check the return_val object and its attributes to ensure they are all serializable.

You can find the relevant code in the libs/langchain/langchain/cache.py file.

As for the Generation object in LangChain, it is a class that represents a single text generation output. It has the following attributes:

  1. text: This is a string attribute that holds the generated text output.

  2. generation_info: This is an optional dictionary attribute that holds the raw response from the provider. It may include things like the reason for finishing or token log probabilities.

  3. lc_serializable: This is a property that returns a boolean indicating whether this class is LangChain serializable.

You can find the relevant code in the libs/langchain/langchain/schema/output.py file.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 10 months ago

Hi, @Coditas-Vikas-Tiwari,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you were encountering a TypeError related to pickling a 'module' object while attempting to store the cache into a MySQL database using chatopenai and gptcache. Dosubot provided a detailed response explaining the error and suggesting potential solutions, including ensuring that the objects being stored in the cache are serializable and checking the attributes of the return_val object. The issue has been resolved, and you can refer to the links to relevant code files for further reference.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!