Closed bhargobdeka closed 2 months ago
@lhnguyen102 Can you take a look at Bhargob suggested changes for introducing a out_gain parameter?
Sure, I will go through it next week
This PR wont be necessary unfortunately because out_noise
is equivalent to gain_w
. More specifically, gain_w
allows us to define the gain factor for each layer instead of one value for all layer like out_noise
Hi Ha,
If that is the case, its okay.
However, the pull request also has a change in how the biases are initialized. For Heteroscedastic, there is no need for noise_gain for the biases. Hence, the gaussian_param_init_ni is not required in Line 1055. Please look at the attached screenshot.
Bhargob

On Apr 15, 2024, at 18:30, Luong-Ha Nguyen @.***> wrote:
@lhnguyen102 commented on this pull request.
In src/net_prop.cpp https://github.com/lhnguyen102/cuTAGI/pull/65#discussion_r1566502479:
@@ -1040,7 +1040,7 @@ Param initialize_param(Network &net) { if (net.noise_type.compare("heteros") == 0 && j == num_layers - 1) { std::tie(mw_j, Sw_j) = gaussian_param_init_ni(
- scale, net.gain_w[j], net.noise_gain,
- scale, net.out_gain, net.noise_gain, //net.gain_w[j] @bhargobdeka https://github.com/bhargobdeka you dont need to introduce out_gain because net.gain_w is equivalent to net_out_gain. For example, the net layer is defined as
layers: [1, 7, 7, 1] gain_w: [0, 1, 1, 1] gain_w is for mean and noise_gain is for std in the heteros case
— Reply to this email directly, view it on GitHub https://github.com/lhnguyen102/cuTAGI/pull/65#pullrequestreview-2002280440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQS77FQVTODBEQF3BEWRY73Y5RIIBAVCNFSM6AAAAABGGHVUGKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMBSGI4DANBUGA. You are receiving this because you were mentioned.
Bhargob, let me think about how to handle properly in the code because noise_gain
is only required for the weight but not the bias.
Sure!
On Apr 15, 2024, at 19:16, Luong-Ha Nguyen @.***> wrote:
Bhargob, let me think about how to handle properly in the code because noise_gain is only required for the weight but not the bias.
— Reply to this email directly, view it on GitHub https://github.com/lhnguyen102/cuTAGI/pull/65#issuecomment-2057966430, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQS77FRDOU43PTCIFTTWCLDY5RNUZAVCNFSM6AAAAABGGHVUGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXHE3DMNBTGA. You are receiving this because you were mentioned.
@bhargobdeka line 1055 should be as you implemented if we don't want to touch the bias gain of the std's hidden state.
I will close this PR because you'll need to reverse all changes that you made on out_gain
and modify just line 1055 of code. I will incorporate this change in the upcoming PR instead.
Bhargob and @jamesgoulet I was thinking about this ad-hoc technique of parameter initialization. It seems to me it is not generalizable since we need to introduce different method for initializing the parameters to get the good results.
@jamesgoulet I was thinking about this ad-hoc technique of parameter initialization. It seems to me it is not generalizable since we need to introduce different method for initializing the parameters to get the good results.
@lhnguyen102 I see what you mean and I think that, as you point out, we can keep the current setup where net.gain_b[j]
are equal to one by default anyway, and one can adjust net.gain_w[j]
if needed.
However, another thing I do not understand is why are we initializing the bias parameters like the weights? In my understanding, only the weights' variance should be initialized with a scale
that is function of 1/fan_in
and/or fan_out
. For the biases, I think it should be std::tie(mb_j, Sb_j) = gaussian_param_init(1, net.gain_b[j], net.num_biases[j]);
in all cases.
If you think this is a possibility, I can run a comparative benchmark to see if it performs well. But from a theoretical standpoint, I think it makes more sense.
@jamesgoulet I saw that pytorch uses the same way back then as default implementation. Only difference is that they use uniform distribution but we use Gaussian (see Pytorch code). I just transformed a bit to match with our framework but I am not sure it is optimal. It is always worth trying.
@jamesgoulet I saw that pytorch uses the same way back then as default implementation. Only difference is that they use uniform distribution but we use Gaussian (see Pytorch code). I just transformed a bit to match with our framework but I am not sure it is optimal. It is always worth trying.
@lhnguyen102 👌, I will look into it.
Description
This pull request is to introduce out_gain parameter in the backend code so that when the new version, pytagi V1 has the heteroscedastic feature, it can use both the gain parameters.
Changes Made
The changes were made in these backend files wherever noise_gain was found:
And in these config files, where noise gain was assigned.
Note for Reviewers
These changes cannot immediately be used in pyTAGI_V1 as the python scripts also needs to be carried over from the older version present in the branch UCI_Benchmark_Baseline. To run the benchmarks, please follow the README file in this branch.