tyliu22 commented 4 years ago

Hi, Maria, thanks for your excellent work, which really inspired me a lot. Could you please release the document of the required library version? My PyThon version is 3.7, and the Torch version is 1.7.0. When I try to reimplement your code, there is a RuntimeError: one of the variables “bound_l” needed for gradient computation has been modified by an in-place operation: [torch.FloatTensor [10]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). Thanks in advance.

mperezortiz commented 4 years ago

Hi Tianyu! Thanks for your comments and sorry for the slow reply. The code has been tested with python 3.7 and Torch 1.3. Are you running the example as it is or have you changed something? Let me know if it gets fixed with Torch 1.3!

tyliu22 commented 3 years ago

Hi Maria, thank you and sorry for the late reply (I thought no one will respond to my issue). I have tested your code with Python 3.7 and Torch 1.3. In the models.py -> class Linear, network parameter self.weight is initialized by the function nn.init.truncnormal(). However, in Python 3.7, there is no such function. Therefore, I define a new function truncatednormal(), which has the same function as nn.init.truncnormal(). The source code is shown as follow. One can directly instead self.weight = nn.Parameter(nn.init.trunc_normal_(torch.Tensor( out_features, in_features), 0, sigma_weights, -2*sigma_weights, 2*sigma_weights), requires_grad=True) as self.weight = nn.Parameter(truncated_normal_(torch.Tensor( out_features, in_features), 0, sigma_weights, -2*sigma_weights, 2*sigma_weights), requires_grad=True)

Then, the code can run well. I am not sure if you have met the same issue before.


def truncated_normal_(tensor, mean, std, a, b):

    # Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf

    def norm_cdf(x):
        # Computes standard normal cumulative distribution function
        return (1. + math.erf(x / math.sqrt(2.))) / 2.

    if (mean < a - 2 * std) or (mean > b + 2 * std):
        warnings.warn("mean is more than 2 std from [a, b] in nn.init.trunc_normal_. "
                      "The distribution of values may be incorrect.",
                      stacklevel=2)

    # Values are generated by using a truncated uniform distribution and
    # then using the inverse CDF for the normal distribution.
    # Get upper and lower cdf values
    l = norm_cdf((a - mean) / std)
    u = norm_cdf((b - mean) / std)

    # Uniformly fill tensor with values from [l, u], then translate to
    # [2l-1, 2u-1].
    tensor.uniform_(2 * l - 1, 2 * u - 1)

    # Use inverse cdf transform for normal distribution to get truncated
    # standard normal
    tensor.erfinv_()

    # Transform to proper mean, std
    tensor.mul_(std * math.sqrt(2.))
    tensor.add_(mean)

    # Clamp to ensure it's in the proper range
    tensor.clamp_(min=a, max=b)

    return tensor

mperezortiz commented 3 years ago

Hi Tianyu, thanks for checking this and coming back to me. You are right, the weight initialisation function was missing in the repository. I have added it now (also included below in case it's useful). Sorry for the inconvenience!

def truncnormal(tensor, mean=0., std=1., a=-2., b=2.):

type: (Tensor, float, float, float, float) -> Tensor

r"""Fills the input Tensor with values drawn from a truncated
normal distribution. The values are effectively drawn from the
normal distribution :math:`\mathcal{N}(\text{mean}, \text{std}^2)`
with values outside :math:`[a, b]` redrawn until they are within
the bounds. The method used works best if :math:`\text{mean}` is
near the center of the interval.
Args:
    tensor: an n-dimensional `torch.Tensor`
    mean: the mean of the normal distribution
    std: the standard deviation of the normal distribution
    a: the minimum cutoff value
    b: the maximum cutoff value
Examples:
    >>> w = torch.empty(3, 5)
    >>> nn.init.trunc_normal_(w)
"""
return _no_grad_trunc_normal_(tensor, mean, std, a, b)

def _no_grad_truncnormal(tensor, mean, std, a, b):

Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf

def norm_cdf(x):
    # Computes standard normal cumulative distribution function
    return (1. + math.erf(x / math.sqrt(2.))) / 2.

with torch.no_grad():
    # Get upper and lower cdf values
    l = norm_cdf((a - mean) / std)
    u = norm_cdf((b - mean) / std)

    # Fill tensor with uniform values from [l, u]
    tensor.uniform_(l, u)

    # Use inverse cdf transform from normal distribution
    tensor.mul_(2)
    tensor.sub_(1)

    # Ensure that the values are strictly between -1 and 1 for erfinv
    eps = torch.finfo(tensor.dtype).eps
    tensor.clamp_(min=-(1. - eps), max=(1. - eps))
    tensor.erfinv_()

    # Transform to proper mean, std
    tensor.mul_(std * math.sqrt(2.))
    tensor.add_(mean)

    # Clamp one last time to ensure it's still in the proper range
    tensor.clamp_(min=a, max=b)
    return tensor

tyliu22 commented 3 years ago

Hi Maria. Thank you for your reply. The code runs well now with python 3.7 and Torch 1.3. I really appreciate the work you have done and the code you have released. It's very helpful.

UltimateJupiter commented 3 years ago

Hi, Maria, thanks for your excellent work, which really inspired me a lot. Could you please release the document of the required library version? My PyThon version is 3.7, and the Torch version is 1.7.0. When I try to reimplement your code, there is a RuntimeError: one of the variables “bound_l” needed for gradient computation has been modified by an in-place operation: [torch.FloatTensor [10]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). Thanks in advance.

To run the code with newer version of PyTorch, you can change the sigma method in distribution (e.g. class Gaussian in model.py) to the explicit computation

def sigma(self):
    # Computation of standard deviation:
    # We use rho instead of sigma so that sigma is always positive during
    # the optimisation. Specifically, we use sigma = log(exp(rho)+1)
    return torch.log(1 + torch.exp(self.rho))

The pytorch implementation of softplus has changed after v1.3

mperezortiz / PBB

Library version Requirements document #1

type: (Tensor, float, float, float, float) -> Tensor

Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf