[ENH] defining boundary between `sktime` and `pytorch-forecasting`, support for foundation models

Issue collecting design discussion related to the boundary between sktime and pytorch-forecasting, with a particular focus on foundation models and weight management, also see sktime issue https://github.com/sktime/sktime/issues/6177.

In sktime we have recently started to introduce a dedicated layer for pytorch based forecasting models, and also specifically around weight management for "foundation models" - pre-trained deep learning models, of transformer architecture, which are predominantly pytorch based as well.

This situation begs the natural question, whether some parts of this layer - and if yes, which precisely - would be better contained in pytorch-forecasting. For instance, one could argue that the natural boundaries of pytorch-forecasting are anything that has to do with torch objects and their direct interfaces, which would include the aforementioned foundation models.

One could even argue that all sktime interfaces specific to pytorch based forecasters should be contained in pytorch-forecasting, in the form of a 2nd party interface, along the lines of patterns discussed here: https://github.com/sktime/sktime/issues/6639, that is, sktime estimators present and maintained in pytorch-forecasting towards with the pytorch-facing backend, only specific to forecasting. Although in a case where there are common backend concerns with, say, time series classification (FYI @fnhirwa), that might be cutting off too much.

Listing different layers and sublayers that might help to draw a delineation:

forecasting framework layer, sktime BaseForecaster etc
common pytorch framework layer objects, e.g., a hypothetical BasePytorchNetwork in sktime
common pytorch framework layer objects specific to forecasting, e.g., a hypothetical BasePytorchForecaster in sktime
concrete pytorch networks or layers that are specific to forecasting models, but not full forecasting models
concrete pytorch networks or layers that are specific to time series models but shared by, say, forecasting and time series classification, see multimodal momentfm transformer modl https://github.com/sktime/sktime/issues/6542 (FYI @julian-fong)
concrete pytorch forecasting models, e.g., nbeats forecaster

Based on the discussion in discord. I add my opinions on that.

When it comes to neural network based forecasting, the strength of sktime is that the models are easy to use and not that much background information is needed. I.e., sktime is handling the creation of the datasets/datatloader etc. In contrast to that when using pytorch-forecasting, the user have more direct control over the model and the neural network. For that direct control, the direct implementation of the neural network is necessary. In contrast, for an easy use it is necessary that the interfaced library has a well-defined interface.

Based on these opinions on what are the strengths of sktime and pytorch-forecasting, I would argue that all neural network based models should be implemented in PyTorch-forecasting. When deciding for that solution, we should think about if there are possibilities to streamline the interfacing between sktime and PyTorch-foreasting. E.g., having also a general adapter that is applicable to all models.

sktime / pytorch-forecasting

[ENH] defining boundary between `sktime` and `pytorch-forecasting`, support for foundation models #1618