JinfengM commented 2 years ago

Describe your problem A short description of the problem you are trying to solve. Is the distribution we specified or designed automatically transformed to standard normal distribution when using PCE? Initial implementation (1)Objective:we want explore the approximation capability of PCE in surrogate complex model. (2)method: First we use following code to sample:

8 parameters are used:

CANMX=chaospy.Uniform(0,100) CN2=chaospy.Uniform(35,98) CH_N2=chaospy.Uniform(0,0.3) CH_K2=chaospy.Uniform(5.0,130) ALPHA_BNK=chaospy.Uniform(0,1.0) SOL_AWC=chaospy.Uniform(0,1) SOL_K=chaospy.Uniform(0,2000) SOL_BD=chaospy.Uniform(0.9,2.5)

define joint distribution

joint=chaospy.J(CANMX,CN2,CH_N2,CH_K2,ALPHA_BNK,SOL_AWC,SOL_K,SOL_BD)

sample

gauss_quad= chaospy.generate_quadrature(2,joint) parameter_samples,weights=gauss_quad

invoke complex mode to return time series result with number 60（time step）

store the result in evalations object

for sample in parameter_samples.T: timeseriesResult=ModelExecution(sample) evaluations.append(timeseriesResult)

define the polynomial expansion

expansion = chaospy.generate_expansion(2, joint)

fit the quadrature with Pseudo-spectral projection approach

approx_solver = chaospy.fit_quadrature(expansion, parameter_samples,weights, evaluations)

Here we get a PCE model named approx_solver,everything is ok now.

Below unluckiness is coming :(

In order to verify PCE is able to substitute complex model.

we perform 1000 Monte Carlo (MC) original model evaluations using Latin hypercube sample

so we get 1000 MC evaluations as Observations

next we get 1000 PCE evaluations as Simulations

then we compare Observations with Simulations in terms of mean and variance at 60 time steps,and we sadly found:

mean is well matched with R2 0.96 while variance is not. Variance of Observation(MC) is huge, ranging from 20 to 4000, significantly larger than that of its counterpart(ranging from 10 to 200).

Considering several previous studies report a perfect or acceptable variance under the conditions when variables are followed a standard normal distribution , so we contemplate whether distribution of variable have an influence on variance.

So we conduct a experiment to examine this assumption.

First we assume a standard normal distribution

joint=chaospy.Iid(chaospy.Normal(0,1), 8)

next we sample with standard normal distribution

gauss_quad= chaospy.generate_quadrature(2,joint,rule='gaussian') parameter_samples,weights=gauss_quad standard_parameter_samples=parameter_samples

then we transform standard normal distribution to normal distribution

normal=chaospy.Normal(0,1) parameter_samples=normal.fwd(parameter_samples)

then we transform normal distribution to uniform distribution to perform model evaluation

num=0 for item in parameter_samples: if num==0:#CANMX parameter_samples[num]= MaxMinNormalization(item,0,100) elif num==1:#CN2 parameter_samples[num]=MaxMinNormalization(item,35,98) elif num==2:#CH_N2 parameter_samples[num]=MaxMinNormalization(item,0,0.3) elif num==3:#CH_K2 parameter_samples[num]=MaxMinNormalization(item,5,130) elif num==4:#ALPHA_BNK parameter_samples[num]=MaxMinNormalization(item,0,1) elif num==5:#SOL_AWC parameter_samples[num]=MaxMinNormalization(item,0,1) elif num==6:#SOL_K parameter_samples[num]=MaxMinNormalization(item,0,2000) elif num==7:#SOL_BD parameter_samples[num]=MaxMinNormalization(item,0.9,2.5) num=num+1

perform model evaluation:

for sample in parameter_samples.T: timeseriesResult=ModelExecution(sample) evaluations.append(timeseriesResult)

define the polynomial expansion

expansion = chaospy.generate_expansion(2, joint)

fit the quadrature with Pseudo-spectral projection approach

approx_solver = chaospy.fit_quadrature(expansion, standard_parameter_samples,weights, evaluations)

Below unluckiness is coming again :(

In order to verify PCE is able to substitute complex model.

we perform 1000 Monte Carlo (MC) original model evaluations using Latin hypercube sample

so we get 1000 MC evaluations as Observations

next we get 1000 PCE evaluations as Simulations

then we compare Observations with Simulations in terms of mean and variance at 60 time steps,and we sadly found:

No difference between PCE with and without standard normal distribution.

we also tried point collocation method with fit_regression method ,nothing improved at all.

So my question is (1) whether standard normal distribution transformation is already imposed inherently within chaospy? (2)Or chaospy is supposed to have nothing to do with standard normal distribution (3)If standard normal distribution is demanded,whether my transforming method aforementioned is right? Or what is the right way. (4)when we get Fourier coefficients either from fit_regression or fit_quadrature, how can we truncate unimportant items.

This maybe a huge question, but it confuse us a long time. we would appreciate your reply and thank you for you help.

Additional context Add any other context about the feature request here.

jonathf commented 2 years ago

No there is no underlying assumption on standard normal for your uniform example. but you could enforce such a scheme using generalized polynomial chaos.

Looking at your problem I see that you approximation model is only 9 variables for a 8 dimensional problem. I would perhaps switched to regression method. Reused the samples used for Monte Carlo and upped the polynomial order to e.g. 4 or 5. and see how well the approximation dips as you lower the polynomial order. When you found a good trade-off between complexity and accuracy, I would try lowering the number of samples to see if the model holds.

My two cents.

jonathf commented 2 years ago

As for truncate irrelevant terms, you need the Fourier coefficients. Both fit_regression and fit_quadrature support returning them with retall=1. With it, you can just determined which coefficients you want to keep based on the size:

index = abs(coefficients) < threshold
expansion = expansion[index]

Making a new PCE on this reduced expansion is a solution to your (4).

JinfengM commented 2 years ago

Hi jonathf, thank you for your timely reply, this help me a lot. I will try as you suggested. Chaospy is great, thank for your great work.

jonathf / chaospy

Is the distribution used internally by PCE a standard normal distribution? #374

8 parameters are used:

define joint distribution

sample

invoke complex mode to return time series result with number 60（time step）

store the result in evalations object

define the polynomial expansion

fit the quadrature with Pseudo-spectral projection approach

Here we get a PCE model named approx_solver,everything is ok now.

Below unluckiness is coming :(

In order to verify PCE is able to substitute complex model.

we perform 1000 Monte Carlo (MC) original model evaluations using Latin hypercube sample

so we get 1000 MC evaluations as Observations

next we get 1000 PCE evaluations as Simulations

then we compare Observations with Simulations in terms of mean and variance at 60 time steps,and we sadly found:

mean is well matched with R2 0.96 while variance is not. Variance of Observation(MC) is huge, ranging from 20 to 4000, significantly larger than that of its counterpart(ranging from 10 to 200).

Considering several previous studies report a perfect or acceptable variance under the conditions when variables are followed a standard normal distribution , so we contemplate whether distribution of variable have an influence on variance.

So we conduct a experiment to examine this assumption.

First we assume a standard normal distribution

next we sample with standard normal distribution

then we transform standard normal distribution to normal distribution

then we transform normal distribution to uniform distribution to perform model evaluation

perform model evaluation:

define the polynomial expansion

fit the quadrature with Pseudo-spectral projection approach

Below unluckiness is coming again :(

In order to verify PCE is able to substitute complex model.

we perform 1000 Monte Carlo (MC) original model evaluations using Latin hypercube sample

so we get 1000 MC evaluations as Observations

next we get 1000 PCE evaluations as Simulations

then we compare Observations with Simulations in terms of mean and variance at 60 time steps,and we sadly found:

No difference between PCE with and without standard normal distribution.

we also tried point collocation method with fit_regression method ,nothing improved at all.