k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
910 stars 291 forks source link

random seed #220

Open danpovey opened 2 years ago

danpovey commented 2 years ago

Guys, I notice we do fix_random_seed(42) at the very beginning of training, but shouldn't we be trying to fix the random seed at the start of each epoch as well, to try to ensure that training is repeatable if you restart from an intermediate epoch? Obviously with GPUs we won't easily be able to guarantee replicable behavior, but I think we should at least try.

csukuangfj commented 2 years ago

Apart from setting the seed, there are also other flags needs to be set in order to be reproduciable.

>>> import torch
>>> torch.are_deterministic_algorithms_enabled()
False

See https://github.com/pytorch/pytorch/blob/master/torch/__init__.py#L496

def are_deterministic_algorithms_enabled():
    r"""Returns True if the global deterministic flag is turned on. Refer to
    :func:`torch.use_deterministic_algorithms` documentation for more details.
    """
    return _C._get_deterministic_algorithms()

https://github.com/espnet/espnet/blob/master/espnet/utils/deterministic_utils.py#L8

def set_deterministic_pytorch(args):
    """Ensures pytorch produces deterministic results depending on the program arguments
    :param Namespace args: The program arguments
    """
    # seed setting
    torch.manual_seed(args.seed)

    # debug mode setting
    # 0 would be fastest, but 1 seems to be reasonable
    # considering reproducibility
    # remove type check
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = (
        False  # https://github.com/pytorch/pytorch/issues/6351
    )
    if args.debugmode < 2:
        chainer.config.type_check = False
        logging.info("torch type check is disabled")
    # use deterministic computation or not
    if args.debugmode < 1:
        torch.backends.cudnn.deterministic = False
        torch.backends.cudnn.benchmark = True
        logging.info("torch cudnn deterministic is disabled")
danpovey commented 2 years ago

Mm. My feeling is we don't need to go that far, at least in normal cases, because it will probably affect speed, but i think we should at least be attempting to set the seed to a fixed value.

csukuangfj commented 2 years ago

Ok, will make a PR to reset the seed at the beginning of each epoch.