Different strategies during training and inference

Seems that the code utilizes F.gumbel_softmax during training, but torch.argmin during inference

I want to ask that why using argmin instead of argmax ? I think the mask true should correspond to larger probability, so it should use argmax ?

Also, I have found that your code actually utilizes gumbel softmax at both training and inference stage, since "self.inference" argument is false at both training and inference stage.

Also, I discovered that when I modified to use argmin at inference stage, the performance will drastically drop.

Hope to get your response, thanks!

uzh-rpg / svit

Different strategies during training and inference #8