Closed Xenios91 closed 2 years ago
Hi, it would help if you can share the fields required and the error you see as a bug, the snippet of the code you are trying, etc
This package works with any sklearn classifier or regressor, so it depends on how you are using n-grams, if you are using something like Count Vectorizer plus a classification model like GaussianNB, you can define parameters of each of these classes using a pipeline, then in the grid parameters, you would add the ones you want to search with this package, for example, the ngrams range from Count Vectorizer
Here there is an example of how to pass parameters using a pipeline of different steps
I hope it helps
Hello,
Any chance you have an example of how to use ngrams_range from CountVectorizer? I have tried a few ways with no luck.
Hi, here is a minimal example of mixing those objects in the package.
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Categorical, Continuous
from sklearn.naive_bayes import MultinomialNB
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
twenty_train = fetch_20newsgroups(subset='train', categories=categories,
shuffle=True, random_state=42)
X_train = twenty_train.data
clf = MultinomialNB()
pipe = Pipeline([("vectorizer", CountVectorizer()), ("clf", clf)])
param_grid = {
"clf__alpha": Continuous(0.01, 1, distribution='log-uniform'),
"vectorizer__analyzer": Categorical(["word", "char"])}
evolved_estimator = GASearchCV(
estimator=pipe,
cv=3,
scoring="accuracy",
population_size=15,
generations=20,
tournament_size=3,
param_grid=param_grid,
n_jobs=-1)
evolved_estimator.fit(X_train, twenty_train.target)
print(evolved_estimator.best_params_)
Take into account that the ngram_range
is a tuple, so it doesn't fit this package "space" definition, which is made of integers, continuous and categorical variables. However, you can still tune it using a custom class and defining the lower and upper range of the ngram as individual hyperparameters; refer to this issue for more information about how this can be done.
Hi, here is a minimal example of mixing those objects in the package.
from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import CountVectorizer from sklearn.pipeline import Pipeline from sklearn_genetic import GASearchCV from sklearn_genetic.space import Categorical, Continuous from sklearn.naive_bayes import MultinomialNB categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med'] twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42) X_train = twenty_train.data clf = MultinomialNB() pipe = Pipeline([("vectorizer", CountVectorizer()), ("clf", clf)]) param_grid = { "clf__alpha": Continuous(0.01, 1, distribution='log-uniform'), "vectorizer__analyzer": Categorical(["word", "char"])} evolved_estimator = GASearchCV( estimator=pipe, cv=3, scoring="accuracy", population_size=15, generations=20, tournament_size=3, param_grid=param_grid, n_jobs=-1) evolved_estimator.fit(X_train, twenty_train.target) print(evolved_estimator.best_params_)
Take into account that the
ngram_range
is a tuple, so it doesn't fit this package "space" definition, which is made of integers, continuous and categorical variables. However, you can still tune it using a custom class and defining the lower and upper range of the ngram as individual hyperparameters; refer to this issue for more information about how this can be done.
thanks
So I may be using it wrong, but how does one use ngrams with this tool? Is this feature not implemented?