Using this to predict data in the future with only partial observations available

atsepkov commented 2 years ago

Sorry if this is a naive question, but let's say I have a 1D dataset of observations that has 10 observations, and I want to predict 15th datapoint without being able to measure observations 11-15, but assuming the pattern/trajectory established by the first 10 observations holds. I can't pad the array as that would skew predictions, is it possible to use your library to perform Kalman predictions into the future without the update step after the initial set of observations?

piercus commented 2 years ago

@atsepkov yes you can do this.

The main idea would be to use a function in observation.covariance, and to set a huge variance when the observation is null, then you use null observation when your data are note correct

const dynamic = yourDynamic;
const huge = 1e15;
const baseVariance = 1;
new KalmanFilter({
  dynamic,
  observation : {
    ...
    covariance: function(o){
        if(observation[0] === null){
          return [huge]
        } else {
          return [baseVariance]
        }
    }
  }
})

If you can try this and share an example code (with fiddle or just a nodejs piece of code) i could help you

I'm thinking about creating a pre-made 'sensor-nullable' observationType which would make this use case even easier but it would help if you can share your code.

atsepkov commented 2 years ago

I think I understand, you're effectively flagging the entry as an outlier? I can describe my use case, but it will be hard for me to show my code without extracting the data into a 1-off example as this is a small subset of a larger project I've been working on for a couple years.

I'm basically smoothing out/predicting trends based on data I've got from a few different sources (CENSUS, IRS, FBI). Let's say, for example, we're looking at crime trends for a given region based on historical FBI trends. At this point I've already averaged it into a single per-year entry (although perhaps it's possible to do something smarter by using a multi-dimensional filter that factors in correlations between different sources and/or neighboring regions - I just don't understand the full capability of Kalman to make use of these yet).

For regions with larger population, the trends don't vary much so my naive extrapolation logic worked ok. But for more rural regions, the data can be all over the place from year to year (i.e. no crime at all in certain years, and single crime affects these much more due to small population). What I have is effectively a list of readings from a period when digitized data is available (around 20-year timeframe for most of these sources). This data also tends to lag behind a few years by the time the agencies digitize is, and I'm trying to basically extrapolate it to today.

So let's say I grabbed the crime data for a given region and I'm trying to extrapolate it a few years into the future, I've tried the approach you describe above, but seem to be having trouble due to unfamiliarity with your library. Here is what I have so far:

const {KalmanFilter} = require('kalman-filter');

let dataset = [0,0,0,0,16.1,0,0,30.9,0,0,0,0,26.1,null,null] // this is a "index" metric based on crime/population, not individual crimes
let baseVariance = 1;
let huge = 1e15;
let kf = new KalmanFilter({
  // do I need a dynamic? based on ones I see in the lib, it seems like constant-acceleration
  // would make the most sense but it requires projection to be defined?
  observation : {
    dimension: 1, // without this I was getting a dimension error
    covariance: function (o) {
        if (o[0] === null){
          return [huge]
        } else {
          return [baseVariance]
        }
    }
  }
})

const res = kf.filterAll(dataset);
console.log(dataset, res);

I get the following error when I attempt to run the above, which seems to imply that reduce isn't returning an array:

./node_modules/kalman-filter/lib/utils/check-matrix.js:4
        if (matrix.reduce((a, b) => a.concat(b)).filter(a => Number.isNaN(a)).length > 0) {
                                                 ^
TypeError: matrix.reduce(...).filter is not a function

piercus commented 2 years ago

@atsepkov

Yes you are right, matrix should be A 2 dimensionnal array, and if you are using null, observations should be 2-dimensionnal arrays too.

Corresponding unit tests are in https://github.com/piercus/kalman-filter/commit/a226029c97a2fb218efe953ebc2b4fec89f28ec5

Here is the corrected version :


    let dataset = [0,0,0,0,16.1,0,0,30.9,0,0,0,0,26.1,null,null].map(a => [a]) // each observation here should be an array, this allow multi-dimensionnal observations. This is not necessary when using numbers only, but with null, we are king of using a trick which requires us to be more explicit in observation formating
    let baseVariance = 1;
    let huge = 1e15;
    let kf = new KalmanFilter({
      observation : {
        dimension: 1, 
        covariance: function (o) {
            if (o.observation[0][0] === null){ // each observation here is a column matrix because kalman filter is a matrix library, so [0] is formatted as [[0]]
              return [[huge]]
            } else {
              return [[baseVariance]]
            }
        }
      }
    })

    const res = kf.filterAll(dataset);
    console.log(dataset, res);

do I need a dynamic? based on ones I see in the lib, it seems like constant-acceleration

I feel that constant-acceleration is dangerous, if you do not put any constant-position is the default one, and it means that by default value[t+1] ~ value[t]

If you want to extrapolate, constant-speed is a good choice and basically assume that value[t+1] ~ value[t] + (value[t]-value[t-1])

But the more complex your filter is, the harder it will be to fine-tune it and have good results, i would suggest you to avoid constant-acceleration in your situation.

although perhaps it's possible to do something smarter by using a multi-dimensional filter that factors in correlations between different sources and/or neighboring regions - I just don't understand the full capability of Kalman to make use of these yet

Yes you could easily extend this and use multi-dimensionnal array.

let's consider you have 2 different sources you could do like

    const {diag} = require('kalman-filter').linalgebra;// only available for now in the branch issue-34

    const dataset = [
        [22, null],
        [25, null],
        [4, 4],
        [4, 4],
        [22, 5],
        [null, null],
        [34, 45]
    ];

    const baseVariance = 1;
    const huge = 1e15;
    const kf = new KalmanFilter({
        observation: {
            stateProjection: [[1], [1]], // this is saying that each measure of the 2D input is projected in the same 'state' dynamic axis. This also gives the observation dimension (2) and dynamic dimension (1)
            covariance(o) {
                const variances = o.observation.map(a => {
                    if (a[0] === null) {
                        return huge;
                    }

                    return baseVariance;
                });
                return diag(variances);
            }
        }
    });

    const response = kf.filterAll(dataset);

piercus commented 2 years ago

:tada: This issue has been resolved in version 1.10.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

piercus / kalman-filter

Using this to predict data in the future with only partial observations available #34