shenweichen / DeepCTR-Torch

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.
https://deepctr-torch.readthedocs.io/en/latest/index.html
Apache License 2.0
2.95k stars 696 forks source link

ZeroDivisionError: float division by zero #36

Closed civilman628 closed 4 years ago

civilman628 commented 4 years ago

get error when run example below in windows 10. The model is default: FiBiNET.

python .\examples\run_classification_criteo.py

Traceback (most recent call last):
  File ".\examples\run_classification_criteo.py", line 54, in <module>
    l2_reg_embedding=1e-5, device=device)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\deepctr_torch\models\fibinet.py", line 53, in __init__
    self.SE = SENETLayer(self.filed_size, reduction_ratio, seed, device)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\deepctr_torch\layers\interaction.py", line 81, in __init__
    nn.Linear(self.filed_size, self.reduction_size, bias=False),
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\linear.py", line 77, in __init__
    self.reset_parameters()
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\linear.py", line 80, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\init.py", line 316, in kaiming_uniform_
    std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero

But i just tried deepFM, it seems no float division by zero issue.

wutongzhang commented 4 years ago

Please provide your initialization parameter settings, for example :
model = FiBiNET(linear_feature_columns=linear_feature_columns, dnn_feature_columns=dnn_feature_columns, reduction_ratio=3, task='binary', l2_reg_embedding=1e-5, device=device) And the parameter reduction_ratio is integer in [1,inf), reduction ratio used in SENET Layer, perhaps this parameter is not set correctly in your case. We will perform parameter range detection and prompt in subsequent versions.

civilman628 commented 4 years ago

I do not change anything, just the default code. but i use CPU version, not GPU.

    def __init__(self, filed_size, reduction_ratio=3,  seed=1024, device='cpu'):
        super(SENETLayer, self).__init__()
        self.seed = seed
        self.filed_size = filed_size
        self.reduction_size = max(1, filed_size // reduction_ratio)
        self.excitation = nn.Sequential(
            nn.Linear(self.filed_size, self.reduction_size, bias=False),
            nn.ReLU(),
            nn.Linear(self.reduction_size, self.filed_size, bias=False),
            nn.ReLU()
        )
        self.to(device)
chenkkkk commented 4 years ago

fixed

shayben commented 4 years ago

@chenkkkk FYI, I have the same bug and no idea what you did to resolve it...

epureanudiana commented 3 years ago

Can someone please explain what the solution is?

zanshuxun commented 3 years ago

Can someone please explain what the solution is?

still get errors on current version? could you provide more details for reproducibility?

epureanudiana commented 3 years ago

I found my mistake in the meantime!

mavarick commented 3 years ago

get this error too. So does anyone knows the reason and how to solve it ?

epureanudiana commented 3 years ago

In my case, I found that I actually had division by zero at a closer look.

HiccupFL commented 1 year ago

def kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'): r"""Fills the inputTensorwith values according to the method described inDelving deep into rectifiers: Surpassing human-level performance on ImageNet classification- He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from :math:\mathcal{N}(0, \text{std}^2)` where

.. math::
    \text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}

Also known as He initialization.

Args:
    tensor: an n-dimensional `torch.Tensor`
    a: the negative slope of the rectifier used after this layer (only
        used with ``'leaky_relu'``)
    mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``
        preserves the magnitude of the variance of the weights in the
        forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the
        backwards pass.
    nonlinearity: the non-linear function (`nn.functional` name),
        recommended to use only with ``'relu'`` or ``'leaky_relu'`` (default).

Examples:
    >>> w = torch.empty(3, 5)
    >>> nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
"""
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
with torch.no_grad():
    return tensor.normal_(0, std)`

And in fan = _calculate_correct_fan(tensor, mode), it is `def _calculate_correct_fan(tensor, mode): mode = mode.lower() valid_modes = ['fan_in', 'fan_out'] if mode not in valid_modes: raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))

fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)
return fan_in if mode == 'fan_in' else fan_out`

and _calculate_fan_in_and_fan_out(tensor) is `def _calculate_fan_in_and_fan_out(tensor): dimensions = tensor.dim() if dimensions < 2: raise ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions")

num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = 1
if tensor.dim() > 2:
    receptive_field_size = tensor[0][0].numel()
fan_in = num_input_fmaps * receptive_field_size
fan_out = num_output_fmaps * receptive_field_size

return fan_in, fan_out`

So that fan_in, fan_out depend on the shape of the tensor, So that you can solve the Question by checking the shape of your variable that you want to initializa

HiccupFL commented 1 year ago

image