Closed wpumain closed 1 year ago
Today I re-read your paper "STUN: Self-Teaching Uncertainty Estimation for Place Recognition" and the code for stun. I was once again impressed by the great design of the program.Thank you for sharing such an excellent article. I have a few questions about numerical calculations. Could I ask for your help? https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L390 https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L392 https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L394 https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L396
# ---------------------- shift sigma_sq ---------------------- #
if self.opt.loss in ['tri', 'quad']: # empically found shifting distribution to be helpful for these losses
log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)
# == numerator
mu_delta = torch.norm((mu_stu - mu_tea), p=2, dim=-1, keepdim=True) # L2 norm -> ([B, D])
# == denominator
sigma_sq = torch.exp(log_sigma_sq)
# == regulizer
loss = (mu_delta / sigma_sq + log_sigma_sq).mean() # ([B, D])
How can I understand the mathematical logic behind these lines of code?
https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L507
What is the purpose of exponentiating the extracted image features by e? What is the mathematical basis for doing this?
Because the output of the variance branch is log(sigma^2).
Today I re-read your paper "STUN: Self-Teaching Uncertainty Estimation for Place Recognition" and the code for stun. I was once again impressed by the great design of the program.Thank you for sharing such an excellent article. I have a few questions about numerical calculations. Could I ask for your help?
https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L390
https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L392
https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L394
https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L396
# ---------------------- shift sigma_sq ---------------------- # if self.opt.loss in ['tri', 'quad']: # empically found shifting distribution to be helpful for these losses log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1) # == numerator mu_delta = torch.norm((mu_stu - mu_tea), p=2, dim=-1, keepdim=True) # L2 norm -> ([B, D]) # == denominator sigma_sq = torch.exp(log_sigma_sq) # == regulizer loss = (mu_delta / sigma_sq + log_sigma_sq).mean() # ([B, D])
How can I understand the mathematical logic behind these lines of code?
log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)
shifts the distribution of log_sigma_sq to enable a comparable scale between mu_delta / sigma_sq and log_sigma_sq. This operation has been found to make the training easier.
Thank you for your guidance. May I ask you two more questions?
How are the values 10 and 0.2 chosen in this line of code?
log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)
How to understand that the output values after the feature values in the student network are processed by the mean_head
layer represent the logarithmic variance of the student network feature values?
10 and 0.2 are found to make the distribution of log_sigma_sq to enable a comparable scale between mu_delta / sigma_sq and log_sigma_sq.
How to understand that the output values after the feature values in the student network are processed by the mean_head layer represent the logarithmic variance of the student network feature values?
I didn’t quite catch your question. Can you break it down a bit more?
If I want to apply your paper's method to other areas, can you tell me the method to find these values of 10 and 0.2, so I can migrate it to other projects? Because in different situations, these values should be different, right?
log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)
Why do the output values after the feature values in the student network are processed by the mean_head layer represent the logarithmic variance of the student network feature values?
If I want to apply your paper's method to other areas, can you tell me the method to find these values of 10 and 0.2, so I can migrate it to other projects? Because in different situations, these values should be different, right?
log_sigma_sq = torch.clamp(10 * log_sigma_sq + 0.2, 0, 1)
I recommend you search for the two parameters.
Why do the output values after the feature values in the student network are processed by the mean_head layer represent the logarithmic variance of the student network feature values?
Because it refers to the logarithmic variance by design.
Think you for your help
Because it refers to the logarithmic variance by design.
Is there a mathematical basis for this?
I recommend you search for the two parameters.
Could you please give me some guidance on how to conduct the search specifically? Thank you.
Think you for your help
https://github.com/ramdrop/stun/blob/d4fd9b0fe8f34a6f5c6a797c80467b0c72c7e88f/trainer.py#L507
What is the purpose of exponentiating the extracted image features by e? What is the mathematical basis for doing this?