In your paper, the mask of each layer is generated by a sigmoid with its value initialized as 0.5. However, the mask is doubled with 1.0 as initial value in this implementation. Could you explain the mismatch between the code and the paper? Or, could you release the results of 1x mask and 2x mask? Thank you.
My guess is that the normal mask for every sample point should be 1. However, the sigmoid function is 0.5 in zero. Thus it is required to be doubled to become 1.
In your paper, the mask of each layer is generated by a sigmoid with its value initialized as 0.5. However, the mask is doubled with 1.0 as initial value in this implementation. Could you explain the mismatch between the code and the paper? Or, could you release the results of 1x mask and 2x mask? Thank you.