Closed danilofreire closed 3 years ago
Hi @danilofreire, do you see any errors during the installation? Are you able to install dtreeviz
package on your machine without problems? Are you installing with pip
?
pip install dtreeviz
There is sometimes problem with graphviz
package when installing the mljar-supervised
and dtreeviz
depends on it.
Hi @pplonski, many thanks for your quick reply. I installed dtreeviz==1.0
and graphviz==0.9
with pip
and unfortunately the error persists when I run the AutoML()
command. I've just noticed that the error appears after the following lines:
5_Default_NeuralNetwork logloss 0.092119 trained in 1.75 seconds
NameError: name '_' is not defined
Maybe I should exclude the Neural Networks from the model? Thanks again!
OK, so you are able to run AutoML
- good. Could you send the code snippet that you are using (ideally with dataset, or link to it)? So I can try to reproduce the bug.
For other models you don't see such error?
Yes, AutoML()
does run here. I wrote some code in R
and I call the AutoML()
function via the reticulate R
package. Here is the full code:
### R
# Install and load required packages
if (!require("reticulate")) {
install.packages("reticulate")
}
if (!require("tidyverse")) {
install.packages("tidyverse")
}
## Data wrangling
# Load data and select variables
load("fl.three.RData")
fl_data <- fl.three %>%
select(onset, warl, gdpenl, lpopl1,
lmtnest, ncontig, Oil, nwstate, instab,
polity2l, ethfrac, relfrac) %>%
mutate(onset = if_else(onset >= 1, 1, onset),
onset = as.factor(onset),
oil = Oil) %>%
select(-Oil)
# Independent variables, dependent variable
fl_x <- fl_data %>% select(-onset)
fl_y <- fl_data %>% select(onset)
### Python
repl_python()
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, auc, roc_auc_score
# Split data
X_train, X_test, y_train, y_test = train_test_split(r.fl_x, r.fl_y,
train_size=0.75, test_size=0.25, stratify=r.fl_y, random_state=48924)
y_train = np.ravel(y_train)
y_test = np.ravel(y_test)
## mljar-supervised
from supervised.automl import AutoML
automl = AutoML(total_time_limit=900, random_state=48924)
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)
The complete dataset is available here. I was able to run automl models using h2o
and tpot
via reticulate
with no errors. Thanks a lot for your help!
Could you check if there is still an error when you save train data to CSV file and load it in pure python script? You can send me CSV file, so I can check this. This might be the bug in the package.
If you are looking for the best ML model (like in the kaggle-competition style), please set the mode="Compete"
.
Hi, @pplonski! AutoML()
runs without errors when I load it in pure Python, as you suggested. I think it's an issue with how reticulate
handles Python variables. Many thanks!
Hi! :)
Thanks for the great software! The AutoML() function throws the following error when I try to use it on Mac OS Catalina (10.15.7):
NameError: name '_' is not defined
. I tried to loadfrom django.utils.translation import gettext as _
, which I read somewhere that it could help, but to no avail. I'm using Python 3.8.5 if that helps.Any information is greatly appreciated! Many thanks!