Open chododom opened 1 year ago
Your understanding is correct for the TemporalFusionTransformer model. You can also see that memory consumption doesn't really grow if more groups are added. Not sure about other models though
Thanks @chododom & @tRosenflanz for answering, I too had the same question.
I am having trouble understanding whether the time-series groups have a common network or separate learning and predictions. I have a dataset containing weather reports from multiple weather stations and I am trying to train a TemporalFusionTransformer network to predict multiple variables such as temperature, wind speed, etc. Is the correct procedure to have each time-series in the TimeSeriesDataSet identified by the weather station id? Will the network share the learned relations in the data for all groups equally or will there be pretty much a separate network for each weather station?
If the groups do, in fact, learn together, why is the group_id a separate parameter and not just another static variable?
Thank you for any help!
EDIT: It seems to me, that the behaviour is dependant on how the model is specifically implemented, so the TimeSeriesDataset is designed to be able to handle both (i.e. the Datalaoder generates a time-series sample along with the group ID and it is up to model implementation to determine whether or not it uses that information). Having had a look at the TemporalFusionTransformer source code, it never directly works with the group_id parameters, it only encodes the specified categoricals and reals. So from my understanding, the answer in the TFT case would be that the network parameters are trained together for all groups and the group_id only really serves to construct a continuous training sample time series, not during training and prediction. Could someone please confirm this?