microsoft / BioGPT

MIT License
4.3k stars 453 forks source link

Cannot load the model TransformerLanguageModelPrompt.from_pretrained #59

Open yuhangjiang22 opened 1 year ago

yuhangjiang22 commented 1 year ago

Hi there,

I cannot load the model RE-DTI-BioGPT by running

import torch
from src.transformer_lm_prompt import TransformerLanguageModelPrompt
m = TransformerLanguageModelPrompt.from_pretrained(
        "checkpoints/RE-BC5CDR-BioGPT", 
        "checkpoint_avg.pt", 
        "data/BC5CDR/relis-bin",
        tokenizer='moses', 
        bpe='fastbpe', 
        bpe_codes="data/bpecodes",
        max_len_b=1024,
        beam=1)

and got the error

2023-02-14 15:08:06 | INFO | fairseq.file_utils | loading archive file checkpoints/RE-BC5CDR-BioGPT
2023-02-14 15:08:06 | INFO | fairseq.file_utils | loading archive file data/BC5CDR/relis-bin
2023-02-14 15:08:10 | INFO | src.language_modeling_prompt | dictionary: 42384 types
2023-02-14 15:08:13 | INFO | fairseq.models.fairseq_model | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': '../../src', 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'distributed_num_procs': 1}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': True, 'max_tokens': 1024, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 1024, 'batch_size_valid': None, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 100, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [32], 'lr': [1e-05], 'stop_min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': '../../checkpoints/RE-BC5CDR-BioGPT', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 1, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 1024, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'transformer_lm_prompt_biogpt', 'activation_fn': 'gelu', 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'relu_dropout': 0.0, 'decoder_embed_dim': 1024, 'decoder_output_dim': 1024, 'decoder_input_dim': 1024, 'decoder_ffn_embed_dim': 4096, 'decoder_layers': 24, 'decoder_attention_heads': 16, 'decoder_normalize_before': True, 'no_decoder_final_norm': False, 'adaptive_softmax_cutoff': None, 'adaptive_softmax_dropout': 0.0, 'adaptive_softmax_factor': 4.0, 'no_token_positional_embeddings': False, 'share_decoder_input_output_embed': True, 'character_embeddings': False, 'character_filters': '[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]', 'character_embedding_dim': 4, 'char_embedder_highway_layers': 2, 'adaptive_input': False, 'adaptive_input_factor': 4.0, 'adaptive_input_cutoff': None, 'tie_adaptive_weights': False, 'tie_adaptive_proj': False, 'decoder_learned_pos': True, 'layernorm_embedding': False, 'no_scale_embedding': False, 'checkpoint_activations': False, 'offload_activations': False, 'decoder_layerdrop': 0.0, 'decoder_layers_to_keep': None, 'quant_noise_pq': 0.0, 'quant_noise_pq_block_size': 8, 'quant_noise_scalar': 0.0, 'min_params_to_wrap': 100000000, 'base_layers': 0, 'base_sublayers': 1, 'base_shuffle': 0, 'add_bos_token': False, 'tokens_per_sample': 1024, 'max_target_positions': 1024, 'tpu': False, 'scale_fc': False, 'scale_attn': False, 'scale_heads': False, 'scale_resids': False}, 'task': {'_name': 'language_modeling_prompt', 'data': 'data/BC5CDR/relis-bin', 'sample_break_mode': 'none', 'tokens_per_sample': 1024, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': False, 'max_target_positions': 1024, 'shorten_method': 'none', 'shorten_data_split_list': '', 'pad_to_fixed_length': False, 'pad_to_fixed_bsz': False, 'seed': 1, 'batch_size': None, 'batch_size_valid': None, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma', 'source_lang': None, 'target_lang': None, 'max_source_positions': 640, 'manual_prompt': None, 'learned_prompt': 9, 'learned_prompt_pattern': 'learned', 'prefix': False, 'sep_token': '<seqsep>'}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': False}, 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9, 0.98)', 'adam_eps': 1e-08, 'weight_decay': 0.01, 'use_old_adam': False, 'tpu': False, 'lr': [1e-05]}, 'lr_scheduler': {'_name': 'inverse_sqrt', 'warmup_updates': 100, 'warmup_init_lr': 1e-07, 'lr': [1e-05]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': {'_name': 'fastbpe', 'bpe_codes': 'data/bpecodes'}, 'tokenizer': {'_name': 'moses', 'source_lang': 'en', 'target_lang': 'en', 'moses_no_dash_splits': False, 'moses_no_escape': False}}
Traceback (most recent call last):
  File "/Users/kaka/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-25-639ebf247e35>", line 1, in <module>
    m = TransformerLanguageModelPrompt.from_pretrained(
  File "/Users/kaka/lib/python3.8/site-packages/fairseq/models/fairseq_model.py", line 275, in from_pretrained
    return hub_utils.GeneratorHubInterface(x["args"], x["task"], x["models"])
  File "/Users/kaka/lib/python3.8/site-packages/fairseq/hub_utils.py", line 108, in __init__
    self.bpe = encoders.build_bpe(cfg.bpe)
  File "/Users/kaka/lib/python3.8/site-packages/fairseq/registry.py", line 61, in build_x
    return builder(cfg, *extra_args, **extra_kwargs)
  File "/Users/kaka/lib/python3.8/site-packages/fairseq/data/encoders/fastbpe.py", line 27, in __init__
    self.bpe = fastBPE.fastBPE(codes)
AttributeError: module 'fastBPE' has no attribute 'fastBPE'

Is there any idea why this is happening?

Husseinfadhel commented 1 year ago

you need to install fastBPE with g++ compiler: https://github.com/glample/fastBPE.git probably fastBPE doesn't work on windows.

tushar20121907 commented 1 year ago

@Husseinfadhel does fastBPE work with WSL ???

bmazeraski commented 1 year ago

fastBPE works fine under WSL/Ubuntu, but additional dependencies are either missing or need to be generated using other scripts. For example, data/BC5CDR/relis-bin doesn't exist in the repo and I'm not sure how to create it. It would be nice if the authors would provide instructions for building everything that's needed in order to successfully run the examples.

bmazeraski commented 1 year ago

I'm running Pycharm on Windows 10 using a conda environment configured under WSL/Ubuntu and this example from the BioGPT github page works fine for me. I haven't been able to run the second example though.

import torch from fairseq.models.transformer_lm import TransformerLanguageModel m = TransformerLanguageModel.from_pretrained( "checkpoints/Pre-trained-BioGPT", "checkpoint.pt", "data", tokenizer='moses', bpe='fastbpe', bpe_codes="data/bpecodes", min_len=100, max_len_b=1024) m.cuda() src_tokens = m.encode("COVID-19 is") generate = m.generate([src_tokens], beam=5)[0] output = m.decode(generate[0]["tokens"]) print(output)

panamantis commented 1 year ago

It looks like the relis-bin is the raw training data and may need to be regenerated via preprocess.sh which rebuilds the source data files from train/test/validate. I'm stuck on the FASTBPE/fast requirement (linux only I believe)

bmazeraski commented 1 year ago

Yes... it appears that fastBPE needs to run under Linux. My attempts to build it under Windows were not successful. My workaround was to install the Windows Subsystem for Linux (WSL) on my Windows 10 box and then create a BioGPT virtual environment on WSL/Ubuntu. All of the BioGPT dependencies are installed in that environment. I develop all of my python code on Windows, but run the code under WSL/Ubuntu by simply setting the virtual environment as my active python interpreter.