TSTPlus issue with fc_dropout

jrfackler commented 1 year ago

Hi @oguiza,

Thanks again for the tsai library. I’m having an issue with the TSTPlus model. The dropout for the fully connected head (fc_dropout) seems to have no affect on the training. I can set it at 0.0 or 0.99 and there is no difference in the training results.

For the Feed-Forward layers (dropout) changing this parameter does have a big effect though.

Am I doing some wrong with the fc_dropout?

Also, I tried the same with InceptionTime and fc_dropout did not seem to affect anything.

Example code below along with the training results chart with fc_dropout =0.0 and fc_dropout=0.99.

Thanks!

!pip install tsai
from tsai.all import *
X, y, splits = get_regression_data('AppliancesEnergy', split_data=False)
tfms = [None, TSRegression()]
batch_tfms = TSStandardize(by_sample=True)
arch_config = dict(d_model=128, d_ff=256, n_layers=3,
           n_heads=8, dropout=0.3, fc_dropout=0.99)
reg = TSRegressor(X, y, splits=splits, path='models', arch="TSTPlus", tfms=tfms, batch_tfms=batch_tfms, metrics=rmse, cbs=ShowGraph(), verbose=True)
reg.fit_one_cycle(100, 3e-4)

fc_dropout=0.00

fc_dropout=0.99

MohitBurkule commented 1 year ago

You need to pass in arch_config to the TSRegressor

from tsai.all import *
X, y, splits = get_regression_data('AppliancesEnergy', split_data=False)
tfms = [None, TSRegression()]
batch_tfms = TSStandardize(by_sample=True)
arch_config = dict(dropout=0.0, fc_dropout=0.99)
reg = TSRegressor(X, y, splits=splits, path='models', arch="TSTPlus", tfms=tfms, batch_tfms=batch_tfms, metrics=rmse, verbose=True, arch_config=arch_config)
fit=reg.fit_one_cycle(100, 3e-4)
_,_,preds=reg.get_X_preds(X,y,with_decoded=True)

You can check the last layer by printing reg.model.head It should look like

Sequential(
  (0): GELU()
  (1): fastai.layers.Flatten(full=False)
  (2): LinBnDrop(
    (0): Dropout(p=0.99, inplace=False)
    (1): Linear(in_features=18432, out_features=1, bias=True)
  )
)

jrfackler commented 1 year ago

Hello @MohitBurkule ,

Thank you very much for your reply. Yes, you found my mistake in the example of not passing the arch_config to the Regressor. Doing that adds the fc_dropout layer.

My main issue is with the TSForecaster though. If I use a multivariate dataset, a fc_dropout layer can't be used in the head. Is that correct?

Using the TSForecaster with a univariate dataset like "Sunspots" does produce a fc_dropout layer though.

For example, using the multivariate forecasting dataset "Weather":

from tsai.all import *
ts = get_forecasting_time_series("Weather").values
X, y = SlidingWindow(60, horizon=1)(ts)
splits = TimeSplitter(235)(y) 
tfms = [None, TSForecasting()]
batch_tfms = TSStandardize()
arch_config = dict(dropout=0.0, fc_dropout=0.99)
fcst = TSForecaster(X, y, splits=splits, path='models', tfms=tfms, batch_tfms=batch_tfms, bs=512, arch="TSTPlus", metrics=mae, arch_config=arch_config, cbs=ShowGraph())
print(fcst.model.head)

The resulting head does not have a fc_dropout layer:

lin_nd_head(
  (0): Reshape(bs)
  (1): Linear(in_features=7680, out_features=19, bias=True)
  (2): Reshape(bs, 19)
)

MohitBurkule commented 1 year ago

Looking at the internal code , I cant see a direct way of passing dropout to TSForcaster You can , however, add it afterwards

from tsai.all import *
ts = get_forecasting_time_series("Weather").values
X, y = SlidingWindow(60, horizon=1)(ts)
splits = TimeSplitter(235)(y) 
tfms = [None, TSForecasting()]
batch_tfms = TSStandardize()
arch_config = dict(dropout=0.0, fc_dropout=0.99)
fcst = TSForecaster(X, y, splits=splits, path='models', tfms=tfms, batch_tfms=batch_tfms, bs=512, arch="TSTPlus", metrics=mae, arch_config=arch_config, cbs=ShowGraph())
fcst.model.head = nn.Sequential(fcst.model.head[0], fcst.model.head[1],nn.Dropout(0.99) , fcst.model.head[2])
print(fcst.model.head)
#fcst.fit_one_cycle(10, 3e-4)

resulting in

Sequential(
  (0): Reshape(bs)
  (1): Linear(in_features=7680, out_features=19, bias=True)
  (2): Dropout(p=0.99, inplace=False)
  (3): Reshape(bs, 19)
)

MohitBurkule commented 1 year ago

P.S, I don't mind creating a pull request to add this feature ( If @oguiza is fine with it)

jrfackler commented 1 year ago

Thank you again @MohitBurkule

Your solution of adding the dropout layer afterwards works for the standard case. But, for some reason when using a cu

jrfackler commented 1 year ago

Your solution worked great! Thank you @MohitBurkule

timeseriesAI / tsai

TSTPlus issue with fc_dropout #826