nalepae / pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs
https://nalepae.github.io/pandarallel
BSD 3-Clause "New" or "Revised" License
3.59k stars 208 forks source link

Pandarallel is failing with SSLContext error #267

Open madhvs opened 2 months ago

madhvs commented 2 months ago

I have upgraded openai from 0.28.0 to openai==1.23.5.

My parallel calls to openai with Pandarallel was working well with openai==0.28.0 version.

But failing with the below error after upgrading to openai==1.23.5

File "/app/imssumm/Summ_parallel.py", line 239, in call_iterative_summ_logic prompt_df_1["result"] = prompt_df_1.parallel_apply(lambda x: self.summarize(x["to_be_summarized"],x["token_len"]), axis=1) File "/usr/local/lib/python3.8/site-packages/pandarallel/core.py", line 265, in closure dilled_user_defined_function = dill.dumps(user_defined_function) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 263, in dumps dump(obj, file, protocol, byref, fmode, recurse, kwds)#, strictio) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 235, in dump Pickler(file, protocol, _kwds).dump(obj) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 394, in dump StockPickler.dump(self, obj) File "/usr/lib64/python3.8/pickle.py", line 487, in dump self.save(obj) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 560, in save f(self, obj) # Call unbound method with explicit self File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 1824, in save_function _save_with_postproc(pickler, (_create_function, ( File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 1089, in _save_with_postproc pickler.save_reduce(reduction) File "/usr/lib64/python3.8/pickle.py", line 692, in save_reduce save(args) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 560, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib64/python3.8/pickle.py", line 886, in save_tuple save(element) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 603, in save self.save_reduce(obj=obj, rv) File "/usr/lib64/python3.8/pickle.py", line 717, in save_reduce save(state) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 560, in save f(self, obj) # Call unbound method with explicit self File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 1186, in save_module_dict StockPickler.save_dict(pickler, obj) File "/usr/lib64/python3.8/pickle.py", line 971, in save_dict self._batch_setitems(obj.items()) File "/usr/lib64/python3.8/pickle.py", line 997, in _batch_setitems save(v) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 603, in save self.save_reduce(obj=obj, rv) File "/usr/lib64/python3.8/pickle.py", line 717, in save_reduce save(state) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 560, in save f(self, obj) # Call unbound method with explicit self File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 1186, in save_module_dict StockPickler.save_dict(pickler, obj) File "/usr/lib64/python3.8/pickle.py", line 971, in save_dict self._batch_setitems(obj.items()) File "/usr/lib64/python3.8/pickle.py", line 997, in _batch_setitems save(v) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 603, in save self.save_reduce(obj=obj, rv) File "/usr/lib64/python3.8/pickle.py", line 717, in save_reduce save(state) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 560, in save f(self, obj) # Call unbound method with explicit self File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 1186, in save_module_dict StockPickler.save_dict(pickler, obj) File "/usr/lib64/python3.8/pickle.py", line 971, in save_dict self._batch_setitems(obj.items()) File "/usr/lib64/python3.8/pickle.py", line 997, in _batch_setitems save(v) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 603, in save self.save_reduce(obj=obj, rv) File "/usr/lib64/python3.8/pickle.py", line 717, in save_reduce save(state) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 560, in save f(self, obj) # Call unbound method with explicit self File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 1186, in save_module_dict StockPickler.save_dict(pickler, obj) File "/usr/lib64/python3.8/pickle.py", line 971, in save_dict self._batch_setitems(obj.items()) File "/usr/lib64/python3.8/pickle.py", line 1002, in _batch_setitems save(v) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 603, in save self.save_reduce(obj=obj, rv) File "/usr/lib64/python3.8/pickle.py", line 717, in save_reduce save(state) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 560, in save f(self, obj) # Call unbound method with explicit self File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 1186, in save_module_dict StockPickler.save_dict(pickler, obj) File "/usr/lib64/python3.8/pickle.py", line 971, in save_dict self._batch_setitems(obj.items()) File "/usr/lib64/python3.8/pickle.py", line 997, in _batch_setitems save(v) File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 388, in save StockPickler.save(self, obj, save_persistent_id) File "/usr/lib64/python3.8/pickle.py", line 578, in save rv = reduce(self.proto) TypeError: cannot pickle 'SSLContext' object

shermansiu commented 2 months ago

Please include a minimal, self-contained example to reproduce your bug.

prompt_df_1["result"] = prompt_df_1.parallel_apply(lambda x: self.summarize(x["to_be_summarized"],x["token_len"]), axis=1) doesn't include a definition of self.summarize.

Is it possible to re-write your function so that it doesn't use any SSLContext objects or otherwise unpickleable objects?

madhvs commented 2 months ago

Please include a minimal, self-contained example to reproduce your bug.

prompt_df_1["result"] = prompt_df_1.parallel_apply(lambda x: self.summarize(x["to_be_summarized"],x["token_len"]), axis=1) doesn't include a definition of self.summarize.

Is it possible to re-write your function so that it doesn't use any SSLContext objects or otherwise unpickleable objects?

My code looks like this.


import pandas as pd
from pandarallel import pandarallel
from openai import AzureOpenAI

openai_client = AzureOpenAI(
             api_key=config_nlp.openai_api_key,
             api_version=config_nlp.openai_api_version,
            azure_endpoint =config_nlp.openai_api_base,)

def summarize(self,prompt,max_len):

    api_start_time = datetime.now()
    response = self.openai_client.chat.completions.create(model=summary_model,
    messages = messages,
    temperature=0.8,
    max_tokens=int(max_len)
    )

pandarallel.initialize(progress_bar=True, nb_workers = 7)
recs_pre_summ_tkn = [("data1",23),("data2",24),("data4",123),("data5",243)]
prompt_df_1 = pd.DataFrame(recs_pre_summ_tkn,columns =["to_be_summarized","token_len"])
prompt_df_1["result"] = prompt_df_1.parallel_apply(lambda x: self.summarize(x["to_be_summarized"],x["token_len"]), axis=1)
shermansiu commented 2 months ago

Okay... instead of returning the response, could you perhaps extract the message from response.choices[0].message.content and return that instead?

(I'm assuming your original code returns the response... Your code as-is does not run and summarize is still written as a class method rather than a function)

madhvs commented 2 months ago

response.choices[0].message.content

Okay... instead of returning the response, could you perhaps extract the message from response.choices[0].message.content and return that instead?

I tried it. Facing the same error.