ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.1k stars 1.19k forks source link

Performance consistency between consecutive training on the same data not achieved even if the random_seed is constant #1291

Closed plantroots closed 2 years ago

plantroots commented 3 years ago

summary: I have tried to train a model on the same dataset with the same random_seed (42 default) and I always get a different final model (text classifier: negative, positive, neutral categories). I might be doing something totally wrong or the random_seed parameter isn't controlling all the variability.

env: OS: Ubuntu 20.04.2 LTS Python: 3.8.10 ludwig_version: '0.3.3' tf_version: '2.4.3'

text sample (romanian): Un telefon slab pentru "statutul" de premium. L-am returnat. Cand zic slab ma refer la urmatoarele minusuri: - este doar SuperAmoled Plus - are doar 393 ppi la un ecran de 6.7 " (un simplu Huawei P30 are 422 ppi, la un ecran de 6.1 " ). Am pus acelasi film 4k la ambele iar diferenta este uriasa: P30 are culori vii si claritate f. buna, in schimb Note 20 are culori putin sterse in multe cadre si are o luminozitate considerabil mai mica - rezolutie doar 1080 x 2340 pixeli la un ecran asa mare! (P30 are 1080 x 2400) - bateria se descarca destul de repede la o folosire medie (100% dimineata, pe la ora 18-19 mai sunt 15 %) - camera foto nu m-a impresionat (vechiul P30 face poze mai bune) - amprenta o recunoaste in 70-80% din cazuri din prima (P30 recunoaste amprenta 100% din cazuri din prima) Concluzie: - daca vrei ceva premium : Note 10 Plus(are Dynamic Amoled, 1440 x 3040 pixeli, 498 ppi) sau Note 20 Ultra (are Dynamic Amoles 2X, 1440 x 3088, 496 ppi) - daca nu te uiti la filme si iti doresti 256 GB memorie, atunci ia-ti Note 20. Daca insa vrei rezolutie buna, baterie buna si poze excelente, mergi spre Huawei

command: ('/usr/local/bin/ludwig experiment --gpus -1 --output_directory ' '/tmp/training_arena/08347ec2-6764-4427-b6f1-8f4f11dc2686_classifier_131 ' '--dataset ' '/tmp/training_arena/08347ec2-6764-4427-b6f1-8f4f11dc2686_classifier_131/dataset.csv ' '--config_file ' '/tmp/training_arena/08347ec2-6764-4427-b6f1-8f4f11dc2686_classifier_131/model_config.yaml')

config: { 'combiner': {'type': 'concat'}, 'input_features': [ { 'column': 'text', 'encoder': 'parallel_cnn', 'level': 'word', 'name': 'text', 'preprocessing': { 'lowercase': True, 'word_tokenizer': 'romanian_tokenize_punctuation'}, 'proc_column': 'text_i4HJUa', 'tied': None, 'type': 'text'}], 'output_features': [ { 'column': 'class', 'dependencies': [], 'loss': { 'class_similarities_temperature': 0, 'class_weights': 1, 'confidence_penalty': 0, 'labels_smoothing': 0, 'robust_lambda': 0, 'type': 'softmax_cross_entropy', 'weight': 1}, 'name': 'class', 'proc_column': 'class_mZFLky', 'reduce_dependencies': 'sum', 'reduce_input': 'sum', 'top_k': 3, 'type': 'category'}], 'preprocessing': { 'audio': { 'audio_feature': {'type': 'raw'}, 'audio_file_length_limit_in_s': 7.5, 'in_memory': True, 'missing_value_strategy': 'backfill', 'norm': None, 'padding_value': 0}, 'bag': { 'fill_value': '', 'lowercase': False, 'missing_value_strategy': 'fill_with_const', 'most_common': 10000, 'tokenizer': 'space'}, 'binary': { 'fill_value': 0, 'missing_value_strategy': 'fill_with_const'}, 'category': { 'fill_value': '', 'lowercase': False, 'missing_value_strategy': 'fill_with_const', 'most_common': 10000}, 'date': { 'datetime_format': None, 'fill_value': '', 'missing_value_strategy': 'fill_with_const'}, 'force_split': False, 'h3': { 'fill_value': 576495936675512319, 'missing_value_strategy': 'fill_with_const'}, 'image': { 'in_memory': True, 'missing_value_strategy': 'backfill', 'num_processes': 1, 'resize_method': 'interpolate', 'scaling': 'pixel_normalization'}, 'numerical': { 'fill_value': 0, 'missing_value_strategy': 'fill_with_const', 'normalization': None}, 'sequence': { 'fill_value': '', 'lowercase': False, 'missing_value_strategy': 'fill_with_const', 'most_common': 20000, 'padding': 'right', 'padding_symbol': '', 'sequence_length_limit': 256, 'tokenizer': 'space', 'unknown_symbol': '', 'vocab_file': None}, 'set': { 'fill_value': '', 'lowercase': False, 'missing_value_strategy': 'fill_with_const', 'most_common': 10000, 'tokenizer': 'space'}, 'split_probabilities': (0.7, 0.1, 0.2), 'stratify': None, 'text': { 'char_most_common': 70, 'char_sequence_length_limit': 1024, 'char_tokenizer': 'characters', 'char_vocab_file': None, 'fill_value': '', 'lowercase': True, 'missing_value_strategy': 'fill_with_const', 'padding': 'right', 'padding_symbol': '', 'pretrained_model_name_or_path': None, 'unknown_symbol': '', 'word_most_common': 20000, 'word_sequence_length_limit': 256, 'word_tokenizer': 'romanian_tokenize_punctuation', 'word_vocab_file': None}, 'timeseries': { 'fill_value': '', 'missing_value_strategy': 'fill_with_const', 'padding': 'right', 'padding_value': 0, 'timeseries_length_limit': 256, 'tokenizer': 'space'}, 'vector': { 'fill_value': '', 'missing_value_strategy': 'fill_with_const'}}, 'training': { 'batch_size': 32, 'bucketing_field': None, 'decay': False, 'decay_rate': 0.96, 'decay_steps': 10000, 'early_stop': 5, 'epochs': 100, 'eval_batch_size': 0, 'gradient_clipping': None, 'increase_batch_size_on_plateau': 0, 'increase_batch_size_on_plateau_max': 512, 'increase_batch_size_on_plateau_patience': 5, 'increase_batch_size_on_plateau_rate': 2, 'learning_rate': 0.001, 'learning_rate_warmup_epochs': 1, 'optimizer': { 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'type': 'adam'}, 'reduce_learning_rate_on_plateau': 0, 'reduce_learning_rate_on_plateau_patience': 5, 'reduce_learning_rate_on_plateau_rate': 0.5, 'regularization_lambda': 0, 'regularizer': 'l2', 'staircase': False, 'validation_field': 'combined', 'validation_metric': 'loss'}}

jimthompson5802 commented 2 years ago

@plantroots Try this option when training the model

skip_save_processed_input=True

Reason for the recommnedation is that the mini-batches created from cached input is not generated in reproducibile manner.