What is the purpose of performing exponential calculations with e here?

wpumain commented 1 year ago

https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L507

What is the purpose of exponentiating the extracted image features by e? What is the mathematical basis for doing this?

wpumain commented 1 year ago

Today I re-read your paper "STUN: Self-Teaching Uncertainty Estimation for Place Recognition" and the code for stun. I was once again impressed by the great design of the program.Thank you for sharing such an excellent article. I have a few questions about numerical calculations. Could I ask for your help? https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L390 https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L392 https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L394 https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L396

# ---------------------- shift sigma_sq ---------------------- #
                if self.opt.loss in ['tri', 'quad']:    # empically found shifting distribution to be helpful for these losses
                    log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)
                # == numerator
                mu_delta = torch.norm((mu_stu - mu_tea), p=2, dim=-1, keepdim=True)                # L2 norm -> ([B, D])
                # == denominator
                sigma_sq = torch.exp(log_sigma_sq)
                # == regulizer
                loss = (mu_delta / sigma_sq + log_sigma_sq).mean()                                 # ([B, D])

How can I understand the mathematical logic behind these lines of code?

ramdrop commented 1 year ago

https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L507

What is the purpose of exponentiating the extracted image features by e? What is the mathematical basis for doing this?

Because the output of the variance branch is log(sigma^2).

ramdrop commented 1 year ago

Today I re-read your paper "STUN: Self-Teaching Uncertainty Estimation for Place Recognition" and the code for stun. I was once again impressed by the great design of the program.Thank you for sharing such an excellent article. I have a few questions about numerical calculations. Could I ask for your help?

https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L390

https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L392

https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L394

https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L396
# ---------------------- shift sigma_sq ---------------------- #
                if self.opt.loss in ['tri', 'quad']:    # empically found shifting distribution to be helpful for these losses
                    log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)
                # == numerator
                mu_delta = torch.norm((mu_stu - mu_tea), p=2, dim=-1, keepdim=True)                # L2 norm -> ([B, D])
                # == denominator
                sigma_sq = torch.exp(log_sigma_sq)
                # == regulizer
                loss = (mu_delta / sigma_sq + log_sigma_sq).mean()                                 # ([B, D])
How can I understand the mathematical logic behind these lines of code?

log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1) shifts the distribution of log_sigma_sq to enable a comparable scale between mu_delta / sigma_sq and log_sigma_sq. This operation has been found to make the training easier.

wpumain commented 1 year ago

Thank you for your guidance. May I ask you two more questions?

How are the values 10 and 0.2 chosen in this line of code? log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1) How to understand that the output values after the feature values in the student network are processed by the mean_head layer represent the logarithmic variance of the student network feature values?

ramdrop commented 1 year ago

10 and 0.2 are found to make the distribution of log_sigma_sq to enable a comparable scale between mu_delta / sigma_sq and log_sigma_sq.

How to understand that the output values after the feature values in the student network are processed by the mean_head layer represent the logarithmic variance of the student network feature values?

I didn’t quite catch your question. Can you break it down a bit more?

wpumain commented 1 year ago

If I want to apply your paper's method to other areas, can you tell me the method to find these values of 10 and 0.2, so I can migrate it to other projects? Because in different situations, these values should be different, right?

log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)

wpumain commented 1 year ago

Why do the output values after the feature values in the student network are processed by the mean_head layer represent the logarithmic variance of the student network feature values?

ramdrop commented 1 year ago

If I want to apply your paper's method to other areas, can you tell me the method to find these values of 10 and 0.2, so I can migrate it to other projects? Because in different situations, these values should be different, right?

log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)

I recommend you search for the two parameters.

ramdrop commented 1 year ago

Why do the output values after the feature values in the student network are processed by the mean_head layer represent the logarithmic variance of the student network feature values?

Because it refers to the logarithmic variance by design.

wpumain commented 1 year ago

Think you for your help

Because it refers to the logarithmic variance by design. Is there a mathematical basis for this?

I recommend you search for the two parameters. Could you please give me some guidance on how to conduct the search specifically? Thank you.

ramdrop commented 1 year ago

Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." Advances in neural information processing systems 30 (2017).
You can regard it as hyperparameters of the model and use common hyperparameter search methods.

wpumain commented 1 year ago

Think you for your help

ramdrop / stun

What is the purpose of performing exponential calculations with e here? #5