x-tabdeveloping / turftopic

Robust and fast topic models with sentence-transformers.
https://x-tabdeveloping.github.io/turftopic/
MIT License
12 stars 4 forks source link

sklearn wrapper: 'function' object has no attribute 'fit' #1

Closed jankounchained closed 5 months ago

jankounchained commented 7 months ago

I have a problem fitting GMM and KeyNMF. Probably has something to do with the base.ContextualModel, which in turn has an issue with the sklearn base classes. ClusteringTopicModel works fine.

gmm = GMM(n_components=10).fit(texts) rasises

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[18], [line 1](vscode-notebook-cell:?execution_count=18&line=1)
----> [1](vscode-notebook-cell:?execution_count=18&line=1) gmm = GMM(n_components=10).fit(data['text'].tolist())

File [~/Repositories/turftopic/turftopic/base.py:239](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:239), in ContextualModel.fit(self, raw_documents, y, embeddings)
    [225](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:225) def fit(
    [226](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:226)     self, raw_documents, y=None, embeddings: Optional[np.ndarray] = None
    [227](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:227) ):
    [228](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:228)     """Fits model on the given corpus.
    [229](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:229) 
    [230](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:230)     Parameters
   (...)
    [237](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:237)         Precomputed document encodings.
    [238](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:238)     """
--> [239](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:239)     self.fit_transform(raw_documents, y, embeddings)
    [240](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:240)     return self

File [~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:157](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:157), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    [155](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:155) @wraps(f)
    [156](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:156) def wrapped(self, X, *args, **kwargs):
--> [157](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:157)     data_to_wrap = f(self, X, *args, **kwargs)
    [158](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:158)     if isinstance(data_to_wrap, tuple):
    [159](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:159)         # only wrap the first output for cross decomposition
    [160](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:160)         return_tuple = (
...
---> [87](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/models/gmm.py:87) document_term_matrix = self.vectorizer.fit_transform(raw_documents)
     [88](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/models/gmm.py:88) console.log("Term extraction done.")
     [89](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/models/gmm.py:89) status.update("Fitting mixture model.")

AttributeError: 'function' object has no attribute 'fit_transform'

keynmf = KeyNMF(n_components=10).fit(texts) raises

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[14], [line 2](vscode-notebook-cell:?execution_count=14&line=2)
      [1](vscode-notebook-cell:?execution_count=14&line=1) keynmf = KeyNMF(n_components=10)
----> [2](vscode-notebook-cell:?execution_count=14&line=2) keynmf.fit(data['text'].tolist())

File [~/Repositories/turftopic/turftopic/base.py:239](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:239), in ContextualModel.fit(self, raw_documents, y, embeddings)
    [225](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:225) def fit(
    [226](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:226)     self, raw_documents, y=None, embeddings: Optional[np.ndarray] = None
    [227](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:227) ):
    [228](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:228)     """Fits model on the given corpus.
    [229](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:229) 
    [230](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:230)     Parameters
   (...)
    [237](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:237)         Precomputed document encodings.
    [238](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:238)     """
--> [239](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:239)     self.fit_transform(raw_documents, y, embeddings)
    [240](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/base.py:240)     return self

File [~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:157](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:157), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    [155](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:155) @wraps(f)
    [156](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:156) def wrapped(self, X, *args, **kwargs):
--> [157](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:157)     data_to_wrap = f(self, X, *args, **kwargs)
    [158](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:158)     if isinstance(data_to_wrap, tuple):
    [159](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/.venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py:159)         # only wrap the first output for cross decomposition
...
    [125](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/models/keynmf.py:125)     self.vocab_embeddings = self.encoder_.encode(
    [126](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/models/keynmf.py:126)         self.vectorizer.get_feature_names_out()
    [127](https://file+.vscode-resource.vscode-cdn.net/Users/au582299/Repositories/turftopic/~/Repositories/turftopic/turftopic/models/keynmf.py:127)     )

AttributeError: 'function' object has no attribute 'fit'
x-tabdeveloping commented 7 months ago

Huh I haven't seen these before, I will try to figure out what's going on

x-tabdeveloping commented 7 months ago

I also think some of these issues might be an artifact of me not having released a new version for a while on PyPI. I don't think the main branch has these issues frankly.

x-tabdeveloping commented 7 months ago

Can you confirm that the new version has fixed this issue?

jankounchained commented 7 months ago

updated to turftopic==0.2.0, still having the same problem

x-tabdeveloping commented 7 months ago

that is weird as hell, I just made a Colab notebook and it runs fine in a clean environment. What system are you running on, which version of Python and scikit-learn?

x-tabdeveloping commented 5 months ago

This issue has not seen any activity, and the problem was fixed, closing.