donfour10 commented 4 years ago

Describe the bug After the update to Ludwig version 0.3 I'm struggeling to define a similar model as is version 0.2.1. I am trying to do a multi-label classification on text with the bert encoder.

Additionally I think that the jaccard calculation of ludwig is incorrect. When I calculate it manually I get a completely smaller numbers than the numbers calculated by ludwig. (We rebuild from github after seeing https://github.com/uber/ludwig/issues/973 hoping the issue would resolve but it seemingly did not)

To Reproduce First I want to show the model definition I described in version 0.2.1: model_definition = { 'input_features': [ {'name': 'translated_text', 'type': 'text', 'encoder': 'bert', 'config_path': '/analyst/ludwig_experiments/uncased_L-12_H-768_A-12 (1)/bert_config.json', 'checkpoint_path': '/analyst/ludwig_experiments/uncased_L-12_H-768_A-12 (1)/bert_model.ckpt', 'preprocessing': {'word_tokenizer': 'bert', 'word_vocab_file': '/analyst/ludwig_experiments/uncased_L-12_H-768_A-12 (1)/vocab.txt', 'padding_symbol': '[PAD]', 'unknown_symbol': '[UNK]'} } ], 'output_features': [ {'name': 'target_category', 'type': 'set', 'threshold': 0.30} ], 'training': {'batch_size': 8, 'learning_rate': 0.00002} }

I tried several different things to reproduce this model in version 0.3. The big problems I have are, that I don't know how to use the checkpoint_path. Even the config-path throws an error when I use it as 'pretrained_model_name_or_path' (To fix that I added "model_type": "bert" in the config.json of the downloaded pretrained-model).

Model definition in version 0.3: I commented some code-lines becasue i tried it with them and without them. But nothing really changed. `model_definition = { 'input_features': [ {'name': 'translated_text', 'type': 'text', 'encoder': 'bert', 'pretrained_model_name_or_path': 'bert-large-uncased',

'trainable': True,

    'preprocessing': {
        # 'word_vocab_file': '/analyst/ludwig_experiments/uncased_L-12_H-768_A-12 (1)/vocab.txt',
        'word_tokenizer': 'bert',
        'padding_symbol': '[PAD]',
        'unknown_symbol': '[UNK]'
    }
    }
],
'output_features': [
    {'name': 'target_category',
    'type': 'set',
    'threshold': 0.30}
],
'training': {'batch_size': 8, 'learning_rate': 0.00002}

}`

Expected behavior A result which is similiar to the result wich produced in v0.2.1. F1-Score has fallen from 48 to 11. Jaccard-Similarity incresed from 32 to 52 but thats wrong. I manually calculated the jaccard and it was 8 instead of 52!

Environment (please complete the following information):

OS: Ubuntu
Version 19.10
- Python version 3.7.5
- Ludwig version 0.3 (before 0.2.1)

Additional context So I have several questions or assumptions: Maybe the model validate on the incorrect jaccard numbers? (I don't think so but it would explain why the actual result can't get similar my result with v0.2.1) How can I assign my downloaded model from https://github.com/google-research/bert without using the huggingface path and model? (Even when it should be the same. I can assign the config.json (but only when i add something) and the vocab.txt but not the .ckpt-file) Is it even possible?

I would be very thankful for some hints how I can ensure a similar configuration from v0.2.1 (tf1) to v0.3 (tf2). I am definitely a little lost here and don't want to downgrade ludwig and work with old version in the future.

Example from training_statistics.json v0.3: The jaccard value is stuck directly after the first epoch. Even though the number is not correcrt.

Old traning_statistics.json v0.2.1: Here the Jaccard values were correct and the training went through properly.

w4nderlust commented 4 years ago

@ donfour10 we are going to release a v0.3.1 that fixes some of thsoe issues. In particular we already fixed the calculation of the jaccard score, and we improved tokenization and default parameters for the text encoders. You can try it already by installing from master. Regarding the changes, now we only use the huggingface models. You don't need to specify config_path, tokenizer, vocabulary etc, it's all done automatically under the hood. Let me know if with the code on master you cn actually reproduce your previous results.