Computation of Predictive Uncertainty in Regression Setting

nilsleh commented 2 months ago

@yookoon Thank you for the interesting work and the code repository. We are trying to support your proposed method in our UQ-Library for Deep Learning called Lightning-UQ-Box and have a couple of questions regarding the regression framework.

To take the Regression case based on the UCI datasets for example, the MLP module has a logvar parameter that is a single value for the entire model. During the training and test phase the loss is computed with this homoscedastic logvar parameter here. Additionally, for sampling methods like BNNs or your proposed Density framework, N samples are taken in this loop, however, you only compute the average over the model mean predictions, and the sampling of the weights has no effect on the predictive uncertainty you are computing, since it is just a single learned parameter for the entire model. This seems counterintuitive as the point of BNNs is to model epistemic uncertainty which should influence the overall predictive uncertainty that I get from the model.

In Figure 1 of your paper, you show results on a Toy Regression dataset with input dependent uncertainty, however, the repository does not contain the code to generate this figure as far as I can tell. In your paper you state that based on equations 7-9 "Consequently, predictive uncertainty will be high for test inputs that are improbable in the training density and low for those that are more probable, providing intuitive and reliable predictive uncertainty." However, I fail to see how that can be the case, when you are using a single logvar parameter as your predictive uncertainty. I was therefore wondering whether you could help me out in understanding the utilized notion of predictive uncertainty in the regression case. Thanks in advance!

yookoon commented 2 months ago

Hi Nils,

Thank you for your interest in the paper. The density uncertainty layer is similar to the approximate BNN methods like Variational Dropout and Rank-1 BNNs. These methods do not sample the weights of neural networks but instead injects noise into the activations of the layers. We can show that this corresponds to actually modeling the uncertainty in weights but marginalizing them out when computing the activations of the layers. For example, the predictive distribution of Bayesian linear regression is obtained by marginalizing out the weight posterior.

Similarly, the density uncertainty layers injects noise into the layer activations, so when you run it multiple times, the prediction means will always be different. Then we can use the variance of these predictions as a measure of epistemic uncertainty. As you mentioned we didnt incorporate the uncertainty of the output variance parameter but we think the effect will be minimal as it is just a single parameter.

Hope this answers your question and let me know if you have anything else

nilsleh commented 1 month ago

@yookoon Thank you for your reply. That makes more sense, we want to include a 1D Regression Toy Example in the codebase which can be seen here.

Toy Dataset: Screenshot from 2024-10-01 10-30-25

Model Defined with:

(model): MLP(
    (model): Sequential(
      (0): DensityLinear(
        (linear): Linear(in_features=1, out_features=50, bias=True)
      )
      (1): Tanh()
      (2): Dropout(p=0.0, inplace=False)
      (3): DensityLinear(
        (linear): Linear(in_features=50, out_features=50, bias=True)
      )
      (4): Tanh()
      (5): Dropout(p=0.0, inplace=False)
      (6): DensityLinear(
        (linear): Linear(in_features=50, out_features=50, bias=True)
      )
      (7): Tanh()
      (8): Dropout(p=0.0, inplace=False)
      (9): DensityLinear(
        (linear): Linear(in_features=50, out_features=1, bias=True)
      )
    )
  )

and posterior_std_init=0.1 with 50 samples during prediction.

And Prediction Results:

Screenshot from 2024-10-01 10-35-02

Does that align with what you would expect?

yookoon commented 1 month ago

Hi @nilsleh ,

It's hard to say as I've never experimented with Tanh activations and I don't know what data distribution you are using. At least the result doesn't look too unreasonable to me.

nilsleh commented 1 month ago

@yookoon Thank you for your reply, would you have the code available to produce figure 1 from your paper? Just so I can double check the computation and make sure it's implemented correctly for the library?

yookoon commented 1 month ago

This is the code snippet I used to generate the figure although the hyperparameters may not be exactly the same. Hope this helps.

2024년 10월 7일 (월) 오후 11:43, Nils Lehmann @.***>님이 작성:

@yookoon https://github.com/yookoon Thank you for your reply, would you have the code available to produce figure 1 from your paper? Just so I can double check the computation and make sure it's implemented correctly for the library?

— Reply to this email directly, view it on GitHub https://github.com/yookoon/density_uncertainty_layers/issues/1#issuecomment-2398979786, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRIC5LRO72RD3JNNF552LLZ2N5ILAVCNFSM6AAAAABOAVHHMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJYHE3TSNZYGY . You are receiving this because you were mentioned.Message ID: @.***>

nilsleh commented 1 month ago

@yookoon Apologies, but in your answer, I don't see any code or link pointing to mentioned snippet.

yookoon commented 1 month ago

Sorry, I attached the file in email but obviously it didn't work toy_figure.zip

yookoon / density_uncertainty_layers

Computation of Predictive Uncertainty in Regression Setting #1