Closed wanadzhar913 closed 4 months ago
Hi Husein,
Moga sihat 21! Was messing around with Malaya's topic modelling module and happened upon the error below.
Digging deeper into scikit-learn's documentation, the get_feature_names method was deprecated and replaced with get_feature_names_out since version 1.2 onwards. [Link](https://scikit-learn.org/1.1/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#:~:text=document%2Dterm%20matrix.-,get_feature_names(),get_feature_names%20is%20deprecated%20in%201.0%20and%20will%20be%20removed%20in%201.2.,-get_feature_names_out(%5Binput_features%5D)) for your reference.
get_feature_names
get_feature_names_out
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) [<ipython-input-7-23edf621a026>](https://localhost:8080/#) in <cell line: 1>() ----> 1 lda = malaya.topic_model.decomposition.fit( 2 stem_output_hfmodel, 3 LatentDirichletAllocation, 4 vectorizer = vectorizer, 5 n_topics = 10, [/usr/local/lib/python3.10/dist-packages/malaya/topic_model/decomposition.py](https://localhost:8080/#) in fit(corpus, model, vectorizer, n_topics, cleaning, stopwords, **kwargs) 179 180 tf = vectorizer.fit_transform(corpus) --> 181 tf_features = vectorizer.get_feature_names() 182 compose = model(n_topics).fit(tf) 183 return Topic( AttributeError: 'SkipGramCountVectorizer' object has no attribute 'get_feature_names'
Here's my code for reproducibility:
import malaya from malaya.text.vectorizer import SkipGramCountVectorizer, TfidfVectorizer from sklearn.decomposition import LatentDirichletAllocation stopwords = malaya.text.function.get_stopwords() documents = [ "Emas", "Hm, moga jadi kahwin", "Naaaakk kahwin", "Nikah 25/2/25", "Universiti anak", "My Marriage Story", "Yuran Sekolah Fatimah", "em@s.com", "beli kereta", "car insurance", "my college fund" ] # Stem documents with a Huggingface model hfmodel_stem = malaya.stem.huggingface() stem_output_hfmodel = [] for j in documents: stem_output_hfmodel.append(hfmodel_stem.stem(j)) # Load vectorizer object vectorizer = SkipGramCountVectorizer( max_df = 0.95, min_df = 1, ngram_range = (1, 3), stop_words = stopwords, skip = 2, ) # Create LDA object (error found here) lda = malaya.topic_model.decomposition.fit( stem_output_hfmodel, LatentDirichletAllocation, vectorizer = vectorizer, n_topics = 10, )
Below is my requirements.txt:
requirements.txt
dateparser==1.2.0 scikit-learn==1.2.2 requests==2.31.0 unidecode==1.3.8 numpy==1.25.2 scipy==1.11.4 ftfy==6.2.0 networkx==3.3 sentencepiece==0.1.99 tqdm==4.66.4 malaya-boilerplate==0.0.25 regex==2024.5.15 transformers==4.42.4
Added the minor fix above. Let me know kalau okay.
Hi Husein,
Moga sihat 21! Was messing around with Malaya's topic modelling module and happened upon the error below.
Digging deeper into scikit-learn's documentation, the
get_feature_names
method was deprecated and replaced withget_feature_names_out
since version 1.2 onwards. [Link](https://scikit-learn.org/1.1/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#:~:text=document%2Dterm%20matrix.-,get_feature_names(),get_feature_names%20is%20deprecated%20in%201.0%20and%20will%20be%20removed%20in%201.2.,-get_feature_names_out(%5Binput_features%5D)) for your reference.Here's my code for reproducibility:
Below is my
requirements.txt
: