Class weighting causes nan values in loss

Bug description

When using TransformerBasedClassification with class_weight='balanced', the class_weights can get nan. This does not happen always, but only when the label distribution in the current labeled set is so skewed that all labels are from the same class.

For a multi-label problem, the encountered error is the following:

<...>
  File "/path/to/site-packages/small_text/integrations/transformers/classifiers/classification.py", line 591, in _train_single_batch
    loss.backward()
  File "/path/to/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/path/to/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Function 'BinaryCrossEntropyWithLogitsBackward0' returned nan values in its 0th output.

Steps to reproduce

TransformerBasedClassification with class_weight='balanced'.
All labels of the current labeled set need to come from the same class. (Initialize active learning with such a set to quickly encounter the error.)
The error occurs during the first backpropagation.

Expected behavior

All weights are unequal to nan

Environment:

small-text v1.3.0

Addition information

The problem is here and is caused by the scaling operation.

webis-de / small-text

Class weighting causes nan values in loss #39

Bug description

Steps to reproduce

Expected behavior

Environment:

Addition information