Bert (sentence classification) output is non-deterministic(have checked previous issue, SET model.eval() )

ValMystletainn commented 3 years ago

Environment info

transformers version:
Platform: Ubuntu 18.04
Python version: 3.7.6
PyTorch version (GPU?): 1.5.1
Tensorflow version (GPU?): /
Using GPU in script?: Yes for trainning, Both GPU and CPU for testing scripts
Using distributed or parallel set-up in script?: Yes for trainning

Who can help

@LysandreJik

Information

Model I am using (Bert, XLNet ...): Bert

The problem arises when using:

[ ] the official example scripts: (give details below)
[ v] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[v ] my own task or dataset: (give details below)

I'm using chinese bert to match the similar tags and reduce the size of the database. So I use some mannually merged tags as the dataset, trainning a bert with inputting two tags and outputting the probability of they are similar. It did well after training in the calling of test() function I wrote(of course with model.eval()). But when I save the model to a .pth file and load it in another script, the output is non deterministic.

To reproduce

The whole test scripts is too long, but I have a short test snippet, It should cover the core of this issue.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-bert-wwm-ext")
model = AutoModelForSequenceClassification.from_pretrained("hfl/chinese-bert-wwm-ext")
model.state_dict(torch.load('./weights/best_bert.pth', map_location='cpu'))
# model.cuda()
for i in range(100): # use to control the time to call model.eval()
    foo = 1

# model = model.eval()
model.eval()
with torch.no_grad():
    srcText = '春天' # 'spring'
    tgtText = '春季' # 'spring time'
    predict = model(
                **tokenizer(text=srcText, text_pair=tgtText,
                  truncation=True, return_tensors='pt', max_length=256)
            )
    # NON DETERMINISTIC
    print(torch.softmax(predict.logits, dim=1))

Steps to reproduce the behavior:

run the script above
change the iterative times for foo = 1, or just do nothing
run again
get different outputs logits and probabilities

Expected behavior

Get identical outputs in step 1 and 3

Additional information

I have read issue #4769 and some other similar issues, but I checked again and confirmed I called the function eval()

LysandreJik commented 3 years ago

You should get the following warning when you instantiate your AutoModelForSequenceClassification model:

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at hfl/chinese-bert-wwm-ext and are newly initialized: ['classifier.bias', 'classifier.weight']

This tells you that the sequence classifier is not in the checkpoint you're loading: it will be initialized randomly everytime you re-initialize it.

ValMystletainn commented 3 years ago

Ok see it. but I have train my model and load in

model.state_dict(torch.load('./weights/best_bert.pth', map_location='cpu'))

So it's the pytorch function torch.save(model.state_dict()) does not save the model.classfier and model.bias and I trained, rights?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

MartinoMensio commented 2 years ago

I just managed finally to have deterministic results. If you are still struggling, see https://discuss.huggingface.co/t/initializing-the-weights-of-the-final-layer-of-e-g-bertfortokenclassification-with-a-manual-seed/1377/3

def set_seed(seed: Optional[int] = None):
    """Set all seeds to make results reproducible (deterministic mode).
       When seed is None, disables deterministic mode.
    :param seed: an integer to your choosing
    """
    if seed is not None:
        torch.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
        np.random.seed(seed)
        random.seed(seed)
        os.environ['PYTHONHASHSEED'] = str(seed)

austinday commented 1 year ago

You should get the following warning when you instantiate your AutoModelForSequenceClassification model:
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at hfl/chinese-bert-wwm-ext and are newly initialized: ['classifier.bias', 'classifier.weight']
This tells you that the sequence classifier is not in the checkpoint you're loading: it will be initialized randomly everytime you re-initialize it.

Thank you, I was struggling with trying to figure out why this was happening. I assumed that "random initialization" just meant it was randomly initialized once when the model was instantiated, not every time it's called. Do you know why it has that behavior? Why wouldn't it just be initialized randomly once? What tells it to stop being random? (A round of training? A flag?)

huggingface / transformers