x-tabdeveloping / topicwizard

Powerful topic model visualization in Python
https://x-tabdeveloping.github.io/topicwizard/
MIT License
95 stars 13 forks source link

Topicwizard joblib dump not working with BertopicWrapper #40

Closed dschwarz-tripp closed 2 weeks ago

dschwarz-tripp commented 1 month ago

BertopicWrapper works perfectly with a visualization, but it is failing at the pickling stage when dumping with joblib

model =  BERTopicWrapper(user_topic_model)
topicwizard.visualize(corpus=docs, embeddings=embeddings, model=model)

This snippet works so far.

topic_data = model.prepare_topic_data(corpus=docs, embeddings=embeddings)
joblib.dump(topic_data, "topic_data.joblib")

This fails with error

Can't pickle <function BERTopicWrapper.prepare_topic_data..transform at 0x000001B71B36DDA0> pickling_error.json

x-tabdeveloping commented 1 month ago

Hey @dschwarz-tripp ! I'm responding to peer reviews atm, so I won't be able to fix this issue for at least a week, but I have an idea about what might be causing this (methods can't be pickled but functions and classes can). In the meantime if it's not too much hustle for you you can try to train a BERTopic model in Turftopic, I know for a fact that that works. Here is a tutorial for how to do that: Clustering Topic Models in Turftopic

import joblib
from turftopic import ClusteringTopicModel
from sklearn.cluster import HDBSCAN
import umap

# I also included the default parameters of BERTopic so that the behaviour is as
# close as possible
bertopic_model = ClusteringTopicModel(
    dimensionality_reduction=umap.UMAP(
        n_neighbors=10,
        n_components=5,
        min_dist=0.0,
        metric="cosine",
    ),
    clustering=HDBSCAN(
        min_cluster_size=15,
        metric="euclidean",
        cluster_selection_method="eom",
    ),
    feature_importance="c-tf-idf",
    reduction_method="agglomerative"
)
topic_data = bertopic_model.prepare_topic_data(corpus=docs, embeddings=embeddings)
joblib.dump(topic_data, "topic_data.joblib")
x-tabdeveloping commented 2 weeks ago

Sorry for the delay. It should work now with version 1.1.1. Can you please test and confirm?

dschwarz-tripp commented 2 weeks ago

Thank you! I will update and check back in a little bit

dschwarz-tripp commented 2 weeks ago

Thanks for checking! Still get an error, but it's a different one (thread rlock). For what it's worth, I tried it with your suggestion and an empty Bertopic model and it worked. I THINK that it's due to my choice of representational model in Bertopic, which is calling GPT.

Here's the way I'm building the bertopic model, and below the stack trace.

representation_model = OpenAIRepresentation(
    client,
    model="gpt-4o-mini",
    chat=True,
    nr_docs=6,
    delay_in_seconds=3,
)

sentence_model = SentenceTransformer("all-MiniLM-L6-v2")

umap_model = UMAP(n_neighbors=2, n_components=16, metric="cosine", low_memory=False)
hdbscan_model = HDBSCAN(min_cluster_size=6, metric="euclidean", prediction_data=True)

user_topic_model = BERTopic(
    language="english",
    umap_model=umap_model,
    hdbscan_model=hdbscan_model,
    embedding_model=sentence_model,
    representation_model=representation_model,
    calculate_probabilities=True,
    verbose=True,
)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[46], [line 1](vscode-notebook-cell:?execution_count=46&line=1)
----> [1](vscode-notebook-cell:?execution_count=46&line=1) joblib.dump(topic_data, "topic_data.joblib")

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:553, in dump(value, filename, compress, protocol, cache_size)
    [551](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:551) elif is_filename:
    [552](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:552)     with open(filename, 'wb') as f:
--> [553](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:553)         NumpyPickler(f, protocol=protocol).dump(value)
    [554](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:554) else:
    [555](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:555)     NumpyPickler(filename, protocol=protocol).dump(value)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:487, in _Pickler.dump(self, obj)
    [485](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:485) if self.proto >= 4:
    [486](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:486)     self.framer.start_framing()
--> [487](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:487) self.save(obj)
    [488](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:488) self.write(STOP)
    [489](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:489) self.framer.end_framing()

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:560, in _Pickler.save(self, obj, save_persistent_id)
    [558](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:558) f = self.dispatch.get(t)
    [559](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:559) if f is not None:
--> [560](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:560)     f(self, obj)  # Call unbound method with explicit self
    [561](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:561)     return
    [563](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:563) # Check private dispatch table if any, or else
    [564](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:564) # copyreg.dispatch_table

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:972, in _Pickler.save_dict(self, obj)
    [969](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:969)     self.write(MARK + DICT)
    [971](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:971) self.memoize(obj)
--> [972](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:972) self._batch_setitems(obj.items())

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:998, in _Pickler._batch_setitems(self, items)
    [996](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:996)     for k, v in tmp:
    [997](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:997)         save(k)
--> [998](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:998)         save(v)
    [999](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:999)     write(SETITEMS)
   [1000](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1000) elif n:

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:603, in _Pickler.save(self, obj, save_persistent_id)
    [599](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:599)     raise PicklingError("Tuple returned by %s must have "
    [600](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:600)                         "two to six elements" % reduce)
    [602](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:602) # Save the reduce() output and finally memoize the object
--> [603](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:603) self.save_reduce(obj=obj, *rv)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:692, in _Pickler.save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
    [690](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:690) else:
    [691](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:691)     save(func)
--> [692](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:692)     save(args)
    [693](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:693)     write(REDUCE)
    [695](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:695) if obj is not None:
    [696](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:696)     # If the object is already in the memo, this means it is
    [697](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:697)     # recursive. In this case, throw away everything we put on the
    [698](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:698)     # stack, and fetch the object back from the memo.

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:560, in _Pickler.save(self, obj, save_persistent_id)
    [558](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:558) f = self.dispatch.get(t)
    [559](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:559) if f is not None:
--> [560](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:560)     f(self, obj)  # Call unbound method with explicit self
    [561](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:561)     return
    [563](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:563) # Check private dispatch table if any, or else
    [564](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:564) # copyreg.dispatch_table

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:887, in _Pickler.save_tuple(self, obj)
    [885](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:885) if n <= 3 and self.proto >= 2:
    [886](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:886)     for element in obj:
--> [887](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:887)         save(element)
    [888](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:888)     # Subtle.  Same as in the big comment below.
    [889](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:889)     if id(obj) in memo:

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:603, in _Pickler.save(self, obj, save_persistent_id)
    [599](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:599)     raise PicklingError("Tuple returned by %s must have "
    [600](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:600)                         "two to six elements" % reduce)
    [602](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:602) # Save the reduce() output and finally memoize the object
--> [603](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:603) self.save_reduce(obj=obj, *rv)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:717, in _Pickler.save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
    [715](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:715) if state is not None:
    [716](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:716)     if state_setter is None:
--> [717](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:717)         save(state)
    [718](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:718)         write(BUILD)
    [719](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:719)     else:
    [720](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:720)         # If a state_setter is specified, call it instead of load_build
    [721](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:721)         # to update obj's with its previous state.
    [722](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:722)         # First, push state_setter and its tuple of expected arguments
    [723](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:723)         # (obj, state) onto the stack.

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:560, in _Pickler.save(self, obj, save_persistent_id)
    [558](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:558) f = self.dispatch.get(t)
    [559](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:559) if f is not None:
--> [560](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:560)     f(self, obj)  # Call unbound method with explicit self
    [561](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:561)     return
    [563](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:563) # Check private dispatch table if any, or else
    [564](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:564) # copyreg.dispatch_table

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:972, in _Pickler.save_dict(self, obj)
    [969](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:969)     self.write(MARK + DICT)
    [971](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:971) self.memoize(obj)
--> [972](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:972) self._batch_setitems(obj.items())

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:1003, in _Pickler._batch_setitems(self, items)
   [1001](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1001)     k, v = tmp[0]
   [1002](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1002)     save(k)
-> [1003](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1003)     save(v)
   [1004](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1004)     write(SETITEM)
   [1005](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1005) # else tmp is empty, and we're done

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:603, in _Pickler.save(self, obj, save_persistent_id)
    [599](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:599)     raise PicklingError("Tuple returned by %s must have "
    [600](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:600)                         "two to six elements" % reduce)
    [602](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:602) # Save the reduce() output and finally memoize the object
--> [603](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:603) self.save_reduce(obj=obj, *rv)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:717, in _Pickler.save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
    [715](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:715) if state is not None:
    [716](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:716)     if state_setter is None:
--> [717](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:717)         save(state)
    [718](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:718)         write(BUILD)
    [719](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:719)     else:
    [720](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:720)         # If a state_setter is specified, call it instead of load_build
    [721](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:721)         # to update obj's with its previous state.
    [722](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:722)         # First, push state_setter and its tuple of expected arguments
    [723](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:723)         # (obj, state) onto the stack.

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:560, in _Pickler.save(self, obj, save_persistent_id)
    [558](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:558) f = self.dispatch.get(t)
    [559](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:559) if f is not None:
--> [560](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:560)     f(self, obj)  # Call unbound method with explicit self
    [561](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:561)     return
    [563](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:563) # Check private dispatch table if any, or else
    [564](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:564) # copyreg.dispatch_table

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:972, in _Pickler.save_dict(self, obj)
    [969](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:969)     self.write(MARK + DICT)
    [971](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:971) self.memoize(obj)
--> [972](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:972) self._batch_setitems(obj.items())

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:998, in _Pickler._batch_setitems(self, items)
    [996](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:996)     for k, v in tmp:
    [997](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:997)         save(k)
--> [998](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:998)         save(v)
    [999](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:999)     write(SETITEMS)
   [1000](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1000) elif n:

    [... skipping similar frames: NumpyPickler.save at line 355 (1 times)]

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:603, in _Pickler.save(self, obj, save_persistent_id)
    [599](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:599)     raise PicklingError("Tuple returned by %s must have "
    [600](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:600)                         "two to six elements" % reduce)
    [602](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:602) # Save the reduce() output and finally memoize the object
--> [603](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:603) self.save_reduce(obj=obj, *rv)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:717, in _Pickler.save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
    [715](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:715) if state is not None:
    [716](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:716)     if state_setter is None:
--> [717](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:717)         save(state)
    [718](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:718)         write(BUILD)
    [719](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:719)     else:
    [720](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:720)         # If a state_setter is specified, call it instead of load_build
    [721](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:721)         # to update obj's with its previous state.
    [722](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:722)         # First, push state_setter and its tuple of expected arguments
    [723](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:723)         # (obj, state) onto the stack.

    [... skipping similar frames: NumpyPickler.save at line 355 (1 times)]

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:560, in _Pickler.save(self, obj, save_persistent_id)
    [558](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:558) f = self.dispatch.get(t)
    [559](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:559) if f is not None:
--> [560](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:560)     f(self, obj)  # Call unbound method with explicit self
    [561](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:561)     return
    [563](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:563) # Check private dispatch table if any, or else
    [564](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:564) # copyreg.dispatch_table

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:972, in _Pickler.save_dict(self, obj)
    [969](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:969)     self.write(MARK + DICT)
    [971](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:971) self.memoize(obj)
--> [972](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:972) self._batch_setitems(obj.items())

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:998, in _Pickler._batch_setitems(self, items)
    [996](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:996)     for k, v in tmp:
    [997](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:997)         save(k)
--> [998](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:998)         save(v)
    [999](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:999)     write(SETITEMS)
   [1000](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1000) elif n:

    [... skipping similar frames: NumpyPickler.save at line 355 (4 times), _Pickler.save at line 603 (2 times), _Pickler.save at line 560 (2 times), _Pickler.save_dict at line 972 (2 times), _Pickler.save_reduce at line 717 (2 times), _Pickler._batch_setitems at line 998 (1 times)]

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:998, in _Pickler._batch_setitems(self, items)
    [996](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:996)     for k, v in tmp:
    [997](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:997)         save(k)
--> [998](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:998)         save(v)
    [999](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:999)     write(SETITEMS)
   [1000](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1000) elif n:

    [... skipping similar frames: NumpyPickler.save at line 355 (1 times)]

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:603, in _Pickler.save(self, obj, save_persistent_id)
    [599](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:599)     raise PicklingError("Tuple returned by %s must have "
    [600](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:600)                         "two to six elements" % reduce)
    [602](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:602) # Save the reduce() output and finally memoize the object
--> [603](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:603) self.save_reduce(obj=obj, *rv)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:717, in _Pickler.save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
    [715](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:715) if state is not None:
    [716](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:716)     if state_setter is None:
--> [717](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:717)         save(state)
    [718](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:718)         write(BUILD)
    [719](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:719)     else:
    [720](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:720)         # If a state_setter is specified, call it instead of load_build
    [721](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:721)         # to update obj's with its previous state.
    [722](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:722)         # First, push state_setter and its tuple of expected arguments
    [723](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:723)         # (obj, state) onto the stack.

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:560, in _Pickler.save(self, obj, save_persistent_id)
    [558](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:558) f = self.dispatch.get(t)
    [559](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:559) if f is not None:
--> [560](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:560)     f(self, obj)  # Call unbound method with explicit self
    [561](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:561)     return
    [563](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:563) # Check private dispatch table if any, or else
    [564](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:564) # copyreg.dispatch_table

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:972, in _Pickler.save_dict(self, obj)
    [969](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:969)     self.write(MARK + DICT)
    [971](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:971) self.memoize(obj)
--> [972](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:972) self._batch_setitems(obj.items())

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:1003, in _Pickler._batch_setitems(self, items)
   [1001](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1001)     k, v = tmp[0]
   [1002](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1002)     save(k)
-> [1003](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1003)     save(v)
   [1004](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1004)     write(SETITEM)
   [1005](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1005) # else tmp is empty, and we're done

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:603, in _Pickler.save(self, obj, save_persistent_id)
    [599](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:599)     raise PicklingError("Tuple returned by %s must have "
    [600](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:600)                         "two to six elements" % reduce)
    [602](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:602) # Save the reduce() output and finally memoize the object
--> [603](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:603) self.save_reduce(obj=obj, *rv)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:717, in _Pickler.save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
    [715](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:715) if state is not None:
    [716](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:716)     if state_setter is None:
--> [717](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:717)         save(state)
    [718](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:718)         write(BUILD)
    [719](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:719)     else:
    [720](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:720)         # If a state_setter is specified, call it instead of load_build
    [721](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:721)         # to update obj's with its previous state.
    [722](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:722)         # First, push state_setter and its tuple of expected arguments
    [723](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:723)         # (obj, state) onto the stack.

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:560, in _Pickler.save(self, obj, save_persistent_id)
    [558](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:558) f = self.dispatch.get(t)
    [559](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:559) if f is not None:
--> [560](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:560)     f(self, obj)  # Call unbound method with explicit self
    [561](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:561)     return
    [563](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:563) # Check private dispatch table if any, or else
    [564](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:564) # copyreg.dispatch_table

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:972, in _Pickler.save_dict(self, obj)
    [969](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:969)     self.write(MARK + DICT)
    [971](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:971) self.memoize(obj)
--> [972](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:972) self._batch_setitems(obj.items())

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:998, in _Pickler._batch_setitems(self, items)
    [996](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:996)     for k, v in tmp:
    [997](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:997)         save(k)
--> [998](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:998)         save(v)
    [999](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:999)     write(SETITEMS)
   [1000](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:1000) elif n:

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\site-packages\joblib\numpy_pickle.py:355, in NumpyPickler.save(self, obj)
    [352](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:352)     wrapper.write_array(obj, self)
    [353](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:353)     return
--> [355](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/site-packages/joblib/numpy_pickle.py:355) return Pickler.save(self, obj)

File c:\Users\dschw\anaconda3\envs\kokua-nlp\Lib\pickle.py:578, in _Pickler.save(self, obj, save_persistent_id)
    [576](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:576) reduce = getattr(obj, "__reduce_ex__", None)
    [577](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:577) if reduce is not None:
--> [578](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:578)     rv = reduce(self.proto)
    [579](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:579) else:
    [580](file:///C:/Users/dschw/anaconda3/envs/kokua-nlp/Lib/pickle.py:580)     reduce = getattr(obj, "__reduce__", None)

TypeError: cannot pickle '_thread.RLock' object
x-tabdeveloping commented 2 weeks ago

Yeah that looks like a quite the pickle (pun intended). I would recommend that you override the topic_names property of the TopicData object manually to the ones extracted by GPT.

sentence_model = SentenceTransformer("all-MiniLM-L6-v2")

umap_model = UMAP(n_neighbors=2, n_components=16, metric="cosine", low_memory=False)
hdbscan_model = HDBSCAN(min_cluster_size=6, metric="euclidean", prediction_data=True)

# Extract topics with BERTopic
user_topic_model = BERTopic(
    language="english",
    umap_model=umap_model,
    hdbscan_model=hdbscan_model,
    embedding_model=sentence_model,
    calculate_probabilities=True,
    verbose=True,
)
wrapped = BERTopicWrapper(user_topic_model)
topic_data = model.prepare_topic_data(corpus)

# Extract topic representations with GPT
representation_model = OpenAIRepresentation(
    client,
    model="gpt-4o-mini",
    chat=True,
    nr_docs=6,
    delay_in_seconds=3,
)
topic_representations = representation_model.extract_topics(wrapped.model, corpus, wrapped.model.c_tf_idf_, wrapped.model.topic_representations_)
topic_labels = [
    "_".join([word[0] for word in values[:4]])
    for key, values in topic_representations.items()
]

# Set topic_names on topic_data
topic_data["topic_names"] = topic_labels
x-tabdeveloping commented 2 weeks ago

I'm also currently working on getting generative AI topic labels in turftopic, I can ping you when that is a thing :)