jonathf / chaospy

Chaospy - Toolbox for performing uncertainty quantification.
https://chaospy.readthedocs.io/
MIT License
448 stars 87 forks source link

Data-driven PCE #122

Closed rhaghi closed 5 years ago

rhaghi commented 5 years ago

I have an array of random data with 5000 members. This is the output of a model. I want to fit a PCE to the distribution, however, I don't have any idea about the input. The output distribution is normal. tried this (u is the output):

noSamples = 5000

xi1= cp.Normal(0,1) xi2= cp.Normal(0,1)

dist = cp.J(xi1,xi2)

dataPoints = dist.sample(noSamples)

polyOrder = 2 orthPoly = cp.orth_ttr(polyOrder, dist)

approx = cp.fit_regression(orthPoly, dataPoints, u)

expected = cp.E(approx, dist) deviation = cp.Std(approx, dist)

print(expected,deviation)

The expected value matches the average of the output very well, but the standard deviation is much smaller (almost 1/10) of the output standard deviation. My guess is this is not a proper way when the input is not available. I have read a couple of papers about data-driven PCE such as [Oladyshkin, 2012] and [Torre, 2018]. Is there any way to implement data-driven PCE in Chaospy?

Thanks a lot for your help.

[Oladyshkin, 2012] Data-driven uncertainty quantification using the arbitrary polynomial chaos expansion [Torre, 2018] Data-driven polynomial chaos expansion for machine learning regression

jonathf commented 5 years ago

I don't have access to Oladyshkin, but I do have Torre. I've skimmed that paper, and as a general note I see lots of known components related to UQ: dependency handling, trucation schemes, and least angular regression, to set up a specific PCE. Should generally be a good fit to chaospy.

As for the "data driven" part, I guess that roughly translate to "We only have (X,u) pairs, no model function M". Having that in mind, I am wondering how your u is related to dataPoints. Even if you don't have a model, you still need to ensure that u=M(X). In data driven context, this would typically means that dataPoints isn't generated from dist.sample, but provided before hand.

But I might have misunderstood something. Please enlighten me, if you think so.

rhaghi commented 5 years ago

Thanks a lot for your help and quick respond. For the data-driven part, I only have u (the output of the model). I don't have access to the model M nor the input data x. I only know the distributions of the input (mean and std). I can take samples from those distributions, but they don't necessarily correspond to the output u. My idea was being able to fit a PCE to the output, without having any idea about the input, based on the sampling from standard normal distributions.

Writing this I am realizing what I am trying to do, is not a data-driven PCE, but something probably none sense or even wrong!

Shall I share Oladyshkin paper with you?

jonathf commented 5 years ago

No worries. I just wanted to point out that my information might have been incomplete when I answered.

Unfortunately, if u is all you have your options are a lot more limited. The best you can do is likely what you most likely have been doing, average and empirical variance.