spmvg / nfoursid

Implementation of N4SID, Kalman filtering and state-space models
MIT License
20 stars 5 forks source link

Integration in darts #1

Closed tomasvanpottelbergh closed 2 years ago

tomasvanpottelbergh commented 2 years ago

Thank you very much for creating this nice package, it was the first proper Python implementation of N4SID I could find online!

We are currently integrating your package in Darts, our time series library (see https://github.com/unit8co/darts/pull/692). Because of this, I have a few questions:

spmvg commented 2 years ago

Thanks for the message and best wishes for 2022! What you're working on is really cool. It got me thinking that adhering to a scikit-learn style API might have been a good idea for the nfoursid repo as well. If I understand correctly, that's essentially what will be possible after integration with darts.

I'll reply in bullet points for conciseness:

tomasvanpottelbergh commented 2 years ago

Thanks for the message and best wishes for 2022! What you're working on is really cool. It got me thinking that adhering to a scikit-learn style API might have been a good idea for the nfoursid repo as well. If I understand correctly, that's essentially what will be possible after integration with darts.

Best wishes to you too! Exactly, Darts tries to keep close to the scikit-learn API, although there are several differences as time series are a bit special.

I'll reply in bullet points for conciseness:

  • Publishing to conda-forge

    • Good idea! I haven't published to conda-forge before: I'll look into it. Focusing on darts integration: what are the benefits of also publishing to conda-forge?

The main benefit for the Darts integration would be that we can include nfoursid as a conda dependency in our conda-forge package. Otherwise we need to add an extra step to install nfoursid using pip. This is a good summary of the necessary steps, but I'm happy to help out as I've done it recently for Darts.

  • Relaxing the requirements

    • I might have been a bit strict on the requirements. I've been able to match the requirements for numpy, matplotlib and pandas of darts without issues: see dd11d71.

Thanks a lot for that! I will update the requirements to 0.0.4.

  • Contributing

    • I'm always open to PRs! You should be able to fork the repo and create a PR back to the original using the "forking workflow". I'm curious to see what your ideas are.

Great, the main improvements would be to be able to set an initial state for the Kalman filter and do the system identification on a collection of time series.

Sorry, they were not private methods, but I'm using undocumented attributes such as p_filtereds.

To keep things pragmatic, I would like to merge my PR once there is a conda-forge package. We can then discuss a potential change of API that satisfies everyone.

spmvg commented 2 years ago

The main benefit for the Darts integration would be that we can include nfoursid as a conda dependency in our conda-forge package. Otherwise we need to add an extra step to install nfoursid using pip. This is a good summary of the necessary steps, but I'm happy to help out as I've done it recently for Darts.

See https://github.com/conda-forge/staged-recipes/pull/17401, waiting for merging. I'm not completely sure I've pinged the team correctly (maybe they will pick up the PR themselves?). I'll check back in a day.

I will update the requirements to 0.0.4.

Please update the requirements to 0.1.0. I had to move the tests inside the source so that they are included in the tarball (for the tests of the conda-forge package).

Great, the main improvements would be to be able to set an initial state for the Kalman filter and do the system identification on a collection of time series.

Sounds good! Let's continue with this after the PR is merged.

spmvg commented 2 years ago

@tomasvanpottelbergh thanks for the help in the recipe PR! Conda-forge has been updated: https://anaconda.org/conda-forge/nfoursid

hrzn commented 2 years ago

Very nice! Thanks a lot for your help @spmvg, and thanks for this nice package :) You can see an example here of what the results looks like: https://github.com/unit8co/darts/blob/master/examples/10-Kalman-filter-examples.ipynb

It would be very cool if we could extend Kalman filter support even more in Darts through nfoursid, for instance by doing the system identification on multiple series as mentioned by @tomasvanpottelbergh (this is how several of the the other ML-based forecasting models work in Darts).

spmvg commented 2 years ago

Thanks @hrzn! The resulting notebook looks absolutely pretty! I have become a fan of Darts in the meantime.

If I understand correctly, there are still three open suggestions/ideas in this thread:

  1. Setting the initial state of the Kalman filter. Not sure of the exact use-case though? Do you have an example?
  2. Prevent use of undocumented attributes (e.g. p_filtereds). This will be either documenting or creating an easier API. I'll take a more thorough look into the Darts implementation soon for ideas (not next week).
  3. Supporting multiple series. I read a Darts tutorial and if I understand correctly: Darts supports training on multiple timeseries, which do not need to share the same timestamps/intervals/lengths/scaling. Multiple timeseries would be like having taken n measurements of the same system. a. If I understand correctly, "multiple timeseries" does not mean multidimensional timeseries: those are called "multivariate series" (for the output) or "covariate series" (for the input) in Darts. b. I'm not sure how I would support "multiple timeseries" in a standard system identification problem. A single (possibly multidimensional) timeseries yields one fitted system. So some kind of aggregation would be necessary. How do you tackle this in Darts?
hrzn commented 2 years ago

Thanks @hrzn! The resulting notebook looks absolutely pretty! I have become a fan of Darts in the meantime.

If I understand correctly, there are still three open suggestions/ideas in this thread:

  1. Setting the initial state of the Kalman filter. Not sure of the exact use-case though? Do you have an example?

I'm actually not sure anymore what was the rationale for this. I'll let @tomasvanpottelbergh comment if he has any idea.

  1. Prevent use of undocumented attributes (e.g. p_filtereds). This will be either documenting or creating an easier API. I'll take a more thorough look into the Darts implementation soon for ideas (not next week).

👍

  1. Supporting multiple series. I read a Darts tutorial and if I understand correctly: Darts supports training on multiple timeseries, which do not need to share the same timestamps/intervals/lengths/scaling. Multiple timeseries would be like having taken n measurements of the same system. a. If I understand correctly, "multiple timeseries" does not mean multidimensional timeseries: those are called "multivariate series" (for the output) or "covariate series" (for the input) in Darts.

Your understanding is quite correct. A single time series can be multidimensional (or multivariate), and some models can be trained on multiple time series, each of which can be uni- or multivariate and the different series need not share the same time axes. Covariates are indeed used as models inputs, but they can themselves be multivariate :) as can the target (what we forecast - i.e., the output)

b. I'm not sure how I would support "multiple timeseries" in a standard system identification problem. A single (possibly multidimensional) timeseries yields one fitted system. So some kind of aggregation would be necessary. How do you tackle this in Darts?

I'm not sure either. Your understanding of seeing the N time series as N observations is correct. I think it might be useful; for instance if we have N observed time series corresponding to N identical systems (say, some industrial machines), and we would like to infer some unique Kalman filter for all of them. However I'm not familiar enough with system identification literature to know if/how conveniently this could be achieved.

In the context of forecasting, we are doing this quite easily for machine learning based systems; these systems naturally rely on a set of training examples (input-outputs examples), and so we can simply merge all the training examples coming from each series to form a bigger training set.

spmvg commented 2 years ago

Commenting on @hrzn:

In the context of forecasting, we are doing this quite easily for machine learning based systems; these systems naturally rely on a set of training examples (input-outputs examples), and so we can simply merge all the training examples coming from each series to form a bigger training set.

In the context of ML, I'm curious to learn how you can simply merge all the training examples coming from each series to form a bigger dataset. I'm not sure how I would do that? Take two example datasets [x0, x1, x2] and [y0, y1, y2]. You cannot simply append the datasets to eachother ( [x0, x1, x2], [y0, y1, y2] -> [x0, x1, x2, y0, y1, y2]) since x2 and y0 don't necessarily have time-dependence, right? Do you maybe have some literature or a code example about combining datasets?

Commenting on point 2 of my previous comment

Prevent use of undocumented attributes (e.g. p_filtereds). This will be either documenting or creating an easier API. I'll take a more thorough look into the Darts implementation soon for ideas (not next week).

The function KalmanFilter.filter from Darts uses some attributes that I did not write documentation for, for example p_filtereds. After investigating the implementation in Darts, it looks like the function KalmanFilter.filter is doing things that nfoursid.kalman.Kalman also implements. My rationale is as follows:

tomasvanpottelbergh commented 2 years ago

Setting the initial state of the Kalman filter. Not sure of the exact use-case though? Do you have an example?

Currently the initial state is set to zero, but I think there are cases where more information is available to provide a more realistic state. For example: when the (noiseless) output and state are the same we could use an observation as a guess for the initial state? There also seem to be specific methods in the literature for initialising the state of a Kalman filter.

In the context of ML, I'm curious to learn how you can simply merge all the training examples coming from each series to form a bigger dataset. I'm not sure how I would do that? Take two example datasets [x0, x1, x2] and [y0, y1, y2]. You cannot simply append the datasets to eachother ( [x0, x1, x2], [y0, y1, y2] -> [x0, x1, x2, y0, y1, y2]) since x2 and y0 don't necessarily have time-dependence, right? Do you maybe have some literature or a code example about combining datasets?

I think the correct way depends heavily on the model and training method, but in general concatenation is indeed a bit too simplistic. @hrzn can probably answer this better. Doing this for nfoursid should be possible, as the Matlab n4sid function seems to support it. This is an example of an algorithm I found. Do you think that would make sense?

The function KalmanFilter.filter from Darts uses some attributes that I did not write documentation for, for example p_filtereds. After investigating the implementation in Darts, it looks like the function KalmanFilter.filter is doing things that nfoursid.kalman.Kalman also implements. My rationale is as follows:

  • To interface with the results of nfoursid.kalman.Kalman, I had in mind that people could use the Kalman.to_dateframe() method to get the results they need. This includes the filtered and predicted outputs, and corresponding standard deviations. See for example the bottom of this notebook. Note that the columns are multi-dimensional, so you need a 3-tuple to access the Series that you want, for example: df[(OUTPUT_COL_NAME, Kalman.filtered_label, Kalman.output_label)].

    • Are there things missing in the output of Kalman.to_dataframe() that KalmanFilter.filter needs? I suspect that Darts is taking a sample of a multivariate Gaussian, and then taking sample quantiles to obtain 5%-95% quantiles when plotting. If we are only interested in 5%-95% quantiles, then we don't need to take a sample: we can calculate them exactly given the covariance matrix. The quantiles of a multivariate Gaussian can be calculated from the variances, and the variances are on the diagonal of the covariance matrix. This would save some computing time.

We are indeed taking samples of a multivariate Gaussian. You are completely right that this is unnecessary for the quantiles, but the idea in Darts is to represent probabilistic time series using samples. This makes the approach more general for cases where an analytical approach is not possible.

  • Note that nfoursid already provides the standard deviation you're looking for in the output of Kalman.to_dataframe(), from which you can calculate 5%-95% quantiles for plotting.
  • FYI the same adding of matrices that's going on in Darts cov_matrix = (kf.state_space.c ... + kf.r) is happening in Kalman._measurement_and_state_standard_deviation, but since to_dataframe() only provides standard deviations, I only work with variances (the diagonal) and ignore covariances.

Indeed, but we need the full covariance matrix to draw samples. I think providing this as an attribute and documenting y_filtereds would probably address my "undocumented attribute" concern. I understand the Kalman.to_dataframe() approach, but maybe it would be nice to have (documented) access to these outputs as numpy arrays via attributes for use cases like ours?

Since this thread is starting to get complicated, we should maybe split the topics into different issues if that works for you?

hrzn commented 2 years ago

Hi, just to comment about your question of how we combine several time series, the way it would work in your example is basically as follows. The two time series [x0, x1, x2] and [y0, y1, y2] would each be sliced in several "training samples". A training sample is a tuple of (input, output) subseries. For example, they might be split it

for models using a lookback window of 2 (called input_chunk_length in Darts), and a "forward window" of 1. We can then merge all these tuples (usually many tuples per series, over different series) into a single training dataset, which we can then use to train different models in a supervised way. For instance deep learning models might be trained using a loss functions that aggregates the prediction errors of the model on the "forward windows" of the training set. I hope it makes things clearer :)

spmvg commented 2 years ago

@tomasvanpottelbergh thanks for the input and sorry for the late reply! Indeed, I have splitted your comments into separate issues:

I tried to word your ideas as best as possible, but if you have comments, let me know on those issues!

I suggest closing this issue and continuing with the others.