lhnguyen102 / cuTAGI

CUDA implementation of Tractable Approximate Gaussian Inference
MIT License
24 stars 9 forks source link

Introducing OUT_GAIN parameter for Heteroscedastic Tasks #65

Closed bhargobdeka closed 2 months ago

bhargobdeka commented 2 months ago

Description

This pull request is to introduce out_gain parameter in the backend code so that when the new version, pytagi V1 has the heteroscedastic feature, it can use both the gain parameters.

Changes Made

The changes were made in these backend files wherever noise_gain was found:

cuTAGI/src/net_init.cpp
cuTAGI/src/net_prop.cpp
cuTAGI/include/struct_var.h
cuTAGI/src/python_api_cpu.cpp
cuTAGI/src/python_api.cu
cuTAGI/test/fnn_heteros/test_fnn_heteros_cpu.cpp

And in these config files, where noise gain was assigned.

cuTAGI/cfg/fc_heteros_mnist.txt
cuTAGI/cfg/fc_heteros_toy.txt
cuTAGI/cfg/fc_homosce_mnist.txt
cuTAGI/cfg/fc_homosce_toy.txt

Note for Reviewers

These changes cannot immediately be used in pyTAGI_V1 as the python scripts also needs to be carried over from the older version present in the branch UCI_Benchmark_Baseline. To run the benchmarks, please follow the README file in this branch.

jamesgoulet commented 2 months ago

@lhnguyen102 Can you take a look at Bhargob suggested changes for introducing a out_gain parameter?

lhnguyen102 commented 2 months ago

Sure, I will go through it next week

lhnguyen102 commented 2 months ago

This PR wont be necessary unfortunately because out_noise is equivalent to gain_w. More specifically, gain_w allows us to define the gain factor for each layer instead of one value for all layer like out_noise

bhargobdeka commented 2 months ago

Hi Ha,

If that is the case, its okay.

However, the pull request also has a change in how the biases are initialized. For Heteroscedastic, there is no need for noise_gain for the biases. Hence, the gaussian_param_init_ni is not required in Line 1055. Please look at the attached screenshot.

Bhargob

On Apr 15, 2024, at 18:30, Luong-Ha Nguyen @.***> wrote:

@lhnguyen102 commented on this pull request.

In src/net_prop.cpp https://github.com/lhnguyen102/cuTAGI/pull/65#discussion_r1566502479:

@@ -1040,7 +1040,7 @@ Param initialize_param(Network &net) { if (net.noise_type.compare("heteros") == 0 && j == num_layers - 1) { std::tie(mw_j, Sw_j) = gaussian_param_init_ni(

  • scale, net.gain_w[j], net.noise_gain,
  • scale, net.out_gain, net.noise_gain, //net.gain_w[j] @bhargobdeka https://github.com/bhargobdeka you dont need to introduce out_gain because net.gain_w is equivalent to net_out_gain. For example, the net layer is defined as

layers: [1, 7, 7, 1] gain_w: [0, 1, 1, 1] gain_w is for mean and noise_gain is for std in the heteros case

— Reply to this email directly, view it on GitHub https://github.com/lhnguyen102/cuTAGI/pull/65#pullrequestreview-2002280440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQS77FQVTODBEQF3BEWRY73Y5RIIBAVCNFSM6AAAAABGGHVUGKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMBSGI4DANBUGA. You are receiving this because you were mentioned.

lhnguyen102 commented 2 months ago

Bhargob, let me think about how to handle properly in the code because noise_gain is only required for the weight but not the bias.

bhargobdeka commented 2 months ago

Sure!

On Apr 15, 2024, at 19:16, Luong-Ha Nguyen @.***> wrote:

Bhargob, let me think about how to handle properly in the code because noise_gain is only required for the weight but not the bias.

— Reply to this email directly, view it on GitHub https://github.com/lhnguyen102/cuTAGI/pull/65#issuecomment-2057966430, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQS77FRDOU43PTCIFTTWCLDY5RNUZAVCNFSM6AAAAABGGHVUGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXHE3DMNBTGA. You are receiving this because you were mentioned.

lhnguyen102 commented 2 months ago

@bhargobdeka line 1055 should be as you implemented if we don't want to touch the bias gain of the std's hidden state. I will close this PR because you'll need to reverse all changes that you made on out_gain and modify just line 1055 of code. I will incorporate this change in the upcoming PR instead. Bhargob and @jamesgoulet I was thinking about this ad-hoc technique of parameter initialization. It seems to me it is not generalizable since we need to introduce different method for initializing the parameters to get the good results.

jamesgoulet commented 2 months ago

@jamesgoulet I was thinking about this ad-hoc technique of parameter initialization. It seems to me it is not generalizable since we need to introduce different method for initializing the parameters to get the good results.

@lhnguyen102 I see what you mean and I think that, as you point out, we can keep the current setup where net.gain_b[j] are equal to one by default anyway, and one can adjust net.gain_w[j] if needed.

However, another thing I do not understand is why are we initializing the bias parameters like the weights? In my understanding, only the weights' variance should be initialized with a scale that is function of 1/fan_in and/or fan_out. For the biases, I think it should be std::tie(mb_j, Sb_j) = gaussian_param_init(1, net.gain_b[j], net.num_biases[j]); in all cases.

If you think this is a possibility, I can run a comparative benchmark to see if it performs well. But from a theoretical standpoint, I think it makes more sense.

lhnguyen102 commented 2 months ago

@jamesgoulet I saw that pytorch uses the same way back then as default implementation. Only difference is that they use uniform distribution but we use Gaussian (see Pytorch code). I just transformed a bit to match with our framework but I am not sure it is optimal. It is always worth trying.

jamesgoulet commented 2 months ago

@jamesgoulet I saw that pytorch uses the same way back then as default implementation. Only difference is that they use uniform distribution but we use Gaussian (see Pytorch code). I just transformed a bit to match with our framework but I am not sure it is optimal. It is always worth trying.

@lhnguyen102 👌, I will look into it.