santhoshkolloju / Abstractive-Summarization-With-Transfer-Learning

Abstractive summarisation using Bert as encoder and Transformer Decoder
407 stars 98 forks source link

Implement NER fine-tuned BERT model #3

Closed CapitalZe closed 5 years ago

CapitalZe commented 5 years ago

I really like what you've done here.

I have a BERT model fine-tuned for NER and would like to implement it using your architecture here.

My intention is to bypass the fine-tuning section where you use stories and directly use my fine-tuned model in it's place.

Do you have any tips?

santhoshkolloju commented 5 years ago

You can replace the init checkpoint model to point to your fined tuned Bert model

CapitalZe commented 5 years ago

Thanks, that worked out.

As for the processor: processor = CNNDailymail() train_dataset = get_dataset(processor,tokenizer,"./",max_seq_length_src,max_seq_length_tgt,4,'train',"./") eval_dataset = get_dataset(processor,tokenizer,"./",max_seq_length_src,max_seq_length_tgt,4,'eval',"./") test_dataset = get_dataset(processor,tokenizer,"./",max_seq_length_src,max_seq_length_tgt,4,'test',"./")

Should I change this to NER processor?

santhoshkolloju commented 5 years ago

Yes if you have separate preprocessing code for NER

CapitalZe commented 5 years ago

Everything has worked well since changing the preprocessor to NER.

However, I am coming stuck against the following issue at the BERT Encoder Graph:

ValueError Traceback (most recent call last) 3 embedder = tx.modules.WordEmbedder( 4 vocab_size=bert_config.vocab_size, ----> 5 hparams=bert_config.embed) 6 word_embeds = embedder(src_input_ids)

Is this to do with the values in the json file or am I mistaken?

santhoshkolloju commented 5 years ago

yes check if the config file is loaded properly

santhoshkolloju commented 5 years ago

post the code you are using here for encoder and config

CapitalZe commented 5 years ago

The encoder is below:

#encoder Bert model
print("Intializing the Bert Encoder Graph")
with tf.variable_scope('bert'):
        embedder = tx.modules.WordEmbedder(
            vocab_size=bert_config.vocab_size,
            hparams=bert_config.embed)
        word_embeds = embedder(src_input_ids)

        # Creates segment embeddings for each type of tokens.
        segment_embedder = tx.modules.WordEmbedder(
            vocab_size=bert_config.type_vocab_size,
            hparams=bert_config.segment_embed)
        segment_embeds = segment_embedder(src_segment_ids)

        input_embeds = word_embeds + segment_embeds

        # The BERT model (a TransformerEncoder)
        encoder = tx.modules.TransformerEncoder(hparams=bert_config.encoder)
        encoder_output = encoder(input_embeds, src_input_length)

        # Builds layers for downstream classification, which is also initialized
        # with BERT pre-trained checkpoint.
        with tf.variable_scope("pooler"):
            # Uses the projection of the 1st-step hidden vector of BERT output
            # as the representation of the sentence
            bert_sent_hidden = tf.squeeze(encoder_output[:, 0:1, :], axis=1)
            bert_sent_output = tf.layers.dense(
                bert_sent_hidden, config_downstream.hidden_dim,
                activation=tf.tanh)
            output = tf.layers.dropout(
                bert_sent_output, rate=0.1, training=tx.global_mode_train())

print("loading the bert pretrained weights")
# Loads pretrained BERT model parameters
init_checkpoint = os.path.join(bert_finetuned_models+model, 'bert_model.ckpt')
#init_checkpoint = "gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12/bert_model.ckpt"
model_utils.init_bert_checkpoint(init_checkpoint)

All I have done here is create a new directory for my existing BERT-NER model called bert_pretrained_models and host both the finetuned and pretrained files there, running the script pointing that directory instead as well.

This addition you will see at the end of the config cell:

#config

dcoder_config = {
    'dim': 768,
    'num_blocks': 6,
    'multihead_attention': {
        'num_heads': 8,
        'output_dim': 768
        # See documentation for more optional hyperparameters
    },
    'position_embedder_hparams': {
        'dim': 768
    },
    'initializer': {
        'type': 'variance_scaling_initializer',
        'kwargs': {
            'scale': 1.0,
            'mode': 'fan_avg',
            'distribution': 'uniform',
        },
    },
    'poswise_feedforward': tx.modules.default_transformer_poswise_net_hparams(
        output_dim=768)
}

loss_label_confidence = 0.9

random_seed = 1234
beam_width = 5
alpha = 0.6
hidden_dim = 768

opt = {
    'optimizer': {
        'type': 'AdamOptimizer',
        'kwargs': {
            'beta1': 0.9,
            'beta2': 0.997,
            'epsilon': 1e-9
        }
    }
}

lr = {
    'learning_rate_schedule': 'constant.linear_warmup.rsqrt_decay.rsqrt_depth',
    'lr_constant': 2 * (hidden_dim ** -0.5),
    'static_lr': 1e-3,
    'warmup_steps': 2000,
}

bos_token_id =101
eos_token_id = 102

model_dir= "./models"
run_mode= "train_and_evaluate"
batch_size = 32
test_batch_size = 32

max_train_epoch = 20
display_steps = 100
eval_steps = 100000

max_decoding_length = 400

max_seq_length_src = 512
max_seq_length_tgt = 400

bert_pretrain_dir = '/content/bert_pretrained_models/NER3/'
bert_finetune_dir = 'bert_finetuned_models/NER3'

#config
CapitalZe commented 5 years ago

Update:

By modifying how I download my BERT-NER fintuned model and the pretrained model (I used Large cased), I was able to bypass the troublesome cell. However, the encoder model throws back this error:

Intializing the Bert Encoder Graph
WARNING:tensorflow:From texar_repo/texar/modules/encoders/transformer_encoders.py:340: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From <ipython-input-16-08b32230cd7b>:28: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
loading the bert pretrained weights
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-16-08b32230cd7b> in <module>()
     33 print("loading the bert pretrained weights")
     34 # Loads pretrained BERT model parameters
---> 35 init_checkpoint = os.path.join(bert_pretrained_models+model, 'bert_model.ckpt')
     36 #init_checkpoint = "gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12/bert_model.ckpt"
     37 model_utils.init_bert_checkpoint(init_checkpoint)

NameError: name 'bert_pretrained_models' is not defined

I do not understand why 'bert_pretrained_models' is being registered as not defined, when in previous cells it has been defined and files successfully saved to and called from that directory.