Closed Allena101 closed 8 months ago
Hi @Allena101,
I think you misunderstood what triage
meant, it's just a way for us to classify issues and be able to filter them. triage
just mean that we have yet to assign a category to the issue.
Your code deviated from the examples in a lot of places, hence causing the error.
Also, it's generally better to have a simple functional "training" script that you then wrap in optuna
to look for the best hyper-parameter rather that start from a complex example and remove element (go from simple to complex rather than complex to simple).
import optuna
from darts.utils.timeseries_generation import linear_timeseries
from darts.models import LightGBMModel
from darts.metrics import smape
# create a dummy series
ts = linear_timeseries(length=100)
ts_train, ts_val = ts.split_after(0.8)
def objective(trial):
max_depth = trial.suggest_categorical("max_depth", [2, 3])
num_leaves = trial.suggest_categorical("num_leaves", [2, 3])
lags = trial.suggest_categorical("lags", [3])
# model constructor does not have the `series` argument
model = LightGBMModel(
forecast_horizon = 3,
max_depth = max_depth,
num_leaves = num_leaves,
lags = lags
)
# train the model
model.fit(
series=ts_train,
val_series=ts_val,
# num_loader_workers=num_workers,
)
# LightGBModel cannot be loaded from checkpoint
# Evaluate how good it is on the validation set, using sMAPE
# `train` was not defined, the name of the variable is `ts_train`
# `n` should be an integer, not a series
preds = model.predict(series=ts_train, n=len(ts_val))
smapes = smape(ts_val, preds, n_jobs=-1, verbose=True)
# you need to return the metric you want to optimize
return smapes
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=3)
Hi @Allena101,
I think you misunderstood what
triage
meant, it's just a way for us to classify issues and be able to filter them.triage
just mean that we have yet to assign a category to the issue.Your code deviated from the examples in a lot of places, hence causing the error.
Also, it's generally better to have a simple functional "training" script that you then wrap in
optuna
to look for the best hyper-parameter rather that start from a complex example and remove element (go from simple to complex rather than complex to simple).import optuna from darts.utils.timeseries_generation import linear_timeseries from darts.models import LightGBMModel from darts.metrics import smape # create a dummy series ts = linear_timeseries(length=100) ts_train, ts_val = ts.split_after(0.8) def objective(trial): max_depth = trial.suggest_categorical("max_depth", [2, 3]) num_leaves = trial.suggest_categorical("num_leaves", [2, 3]) lags = trial.suggest_categorical("lags", [3]) # model constructor does not have the `series` argument model = LightGBMModel( forecast_horizon = 3, max_depth = max_depth, num_leaves = num_leaves, lags = lags ) # train the model model.fit( series=ts_train, val_series=ts_val, # num_loader_workers=num_workers, ) # LightGBModel cannot be loaded from checkpoint # Evaluate how good it is on the validation set, using sMAPE # `train` was not defined, the name of the variable is `ts_train` # `n` should be an integer, not a series preds = model.predict(series=ts_train, n=len(ts_val)) smapes = smape(ts_val, preds, n_jobs=-1, verbose=True) # you need to return the metric you want to optimize return smapes study = optuna.create_study(direction="minimize") study.optimize(objective, n_trials=3)
Hi, Madtoinou π Thanks for taking the time to read my issue.
I get what you mean by taking a bottom up approach rather than a top down approach in this instance. However, i did manage to get it to work , when i tried using nbeats instead of lightgbm (so i have to try again with lightGBM later).
One thing that is important that i dont understand from you code example is the save and load model. you have save model checkpoint = True, and then you load the model after each trial and the comment says:
# reload best model over course of training
model = TCNModel.load_from_checkpoint("tcn_model")
Does this mean that save checkpoint in darts is equal to save_best_only=True in tensorflow?
Also , your example is meant to be ran in once session, correct? since the study object is not saved only the best model as i understand it. Which means that if you train more then the study risk repeating parameter combinations that it has already tried.
NBeats is a deep learning model whereas LightGBM is a regression model, hence the difference is available method/approach, especially for the saving/loading of models.
save_checkpoints
in Darts save the latest by default. If a validation series is provided, the best is also saved. And this is what happens in the example notebook.
Since gridsearch
is brute force all the combinations, there is no point running it again because you already covered the combinations once and the results is not supposed to change.
Furthermore, the principal take away from such hyper-parameters optimisation isin general the parameters themself, because ideally, you should then retrain the model on both the training and validation set.
NBeats is a deep learning model whereas LightGBM is a regression model, hence the difference is available method/approach, especially for the saving/loading of models.
save_checkpoints
in Darts save the latest by default. If a validation series is provided, the best is also saved. And this is what happens in the example notebook.Since
gridsearch
is brute force all the combinations, there is no point running it again because you already covered the combinations once and the results is not supposed to change.Furthermore, the principal take away from such hyper-parameters optimisation isin general the parameters themself, because ideally, you should then retrain the model on both the training and validation set.
doesnt trial.suggest_int randomize from a range of values? In your example trial.suggest_int("num_filters", 1, 5), that would mean that for each trial a value is pcked between 1 and 5 (dont remember now if its inclusive or not). Then it would not be an exhaustive study as i understand it.
regarding save_checkpoint. doe that mean that if you are trainin a darts model normally (without using optuna) and you use a validation set, that if you then load the model it will always be the best scoring epoch and not the latest (i.e. the weights of the best scoring epoch and not the wights from the last epoch)?
Sorry for the confusion: the code snippet you use, based on optuna, does not perform a gridsearch but use more complex algorithm to sample the parameters (documentation) and it's indeed not exhaustive. gridsearch (as mentioned in the title), by definition, is exhaustive.
You can load either the last or the bext checkpoint, depending on the value of the parameter best
of the load_checkpoint()
method.
Sorry for the confusion: the code snippet you use, based on optuna, does not perform a gridsearch but use more complex algorithm to sample the parameters (documentation) and it's indeed not exhaustive. gridsearch (as mentioned in the title), by definition, is exhaustive.
You can load either the last or the bext checkpoint, depending on the value of the parameter
best
of theload_checkpoint()
method.
Thank you madtoinou, i really appreciate the clarification!
If you set save_checkpoints parameter to True , does that mean that every epoch is saved?
Being able to rerun a study at a later time is quite important for me since i have not manage to get gpu training to work (even though my gpu is cuda compatible). I know most about TensorFlow and with that library you have to use cuda. I saw in yoru docs that you are using Pytorch Lightning Trainer, does that mean that you dont have to install cuda?
Hi @Allena101,
Sorry for the delay, got busy with other things.
By default, the trainer generates a checkpoints at the end of each epoch but only keep the last one to limit the number of files (and having hundreds of useless checkpoints). If a validation series is provided, the best (so far) checkpoint is also kept.
SImilarly to tensorflow, Pytorch Lightning also requires cuda in order to be able to use GPU acceleration during training, if passing pl_trainer_kwargs={"accelerator":"gpu"}
does not work for you, I would recommend checking their repo/documentation to solve the gpu detection problem.
Hi @Allena101,
Sorry for the delay, got busy with other things.
By default, the trainer generates a checkpoints at the end of each epoch but only keep the last one to limit the number of files (and having hundreds of useless checkpoints). If a validation series is provided, the best (so far) checkpoint is also kept.
SImilarly to tensorflow, Pytorch Lightning also requires cuda in order to be able to use GPU acceleration during training, if passing
pl_trainer_kwargs={"accelerator":"gpu"}
does not work for you, I would recommend checking their repo/documentation to solve the gpu detection problem.
Hey, I totally understand not having much time to respond. I totally still appreciate you reading my issues and giving me valuable feedback!
Regarding darts built in gridsearch class method. I cant seem to figure out how to use it correctly. I run the code below. And i should only take a few seconds since there are only 2 combinations ( 'kernel_size': [2,3]), but for some reason the grid search runs for hundreds of epochs.
ts_brief = ts_iSum.drop_before(pd.Timestamp("2024-01-01"))
parameters = { 'kernel_size': [2,3], 'num_filters': [3], 'input_chunk_length': [5], 'output_chunk_length': [4], 'n_epochs': [1], }
TCN_brief = TCNModel.gridsearch( parameters=parameters, series = ts_brief['iSum4'], forecast_horizon = 7, metric = rmse, )
TCN_brief
Hi @Allena101,
The parameter n_epochs
is correctly taken into account, however, gridsearch()
calls historical_forecasts()
under the hood when "expanding model" is used (see detailed documentation), which train the model iteratively using an expanding window to identify the set of parameters optimizing the metric
over "historic forecast horizons" (see comment)
If you want to train the model only once and use a simpler train/valid split approach, you just need to provide the val_series
argument.
OBS reposting since i got triage label and as i understand it , that means my post/issue is not viewable. I apologize if i misunderstood what triage meant/did π
I am following your guides on optuna and Ray tune. With ray tune i keep getting time out error and dont know why , but i will start asking about optuna. I want to use lightgbm (as i understand it , any model in darts should be able useble). Will ask about optuna now since i did manage to get it to work some time ago with tensorflow.
I am testing such a simple model as possible just to see if it works and then i can make it more complex. Which seems like a good just get a constant torrent of errors either way.
The error i am getting is: TypeError: Unknown type of parameter:series, got:TimeSeries
Here is the code (again following your guide)
Here is more of the error message: