Never ending training - Githubissues

santhoshkolloju / Abstractive-Summarization-With-Transfer-Learning

Abstractive summarisation using Bert as encoder and Transformer Decoder

407 stars 98 forks source link

Never ending training #22

Open astariul opened 5 years ago

astariul commented 5 years ago

I'm running your code on the CNN/Dailymail dataset.

However, training never end, displaying :

Batch #X

with X growing more and more. I waited a long time, then kill the process.

But now, when I run the inference code, produced summary is very bad. Example :

the two - year - year - year - old cate - old cat was found in the animal .

What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)

Vibha111094 commented 5 years ago

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check . Also please make sure you are sending the delimiter ie [SEP] as an indicator to stop decoding ie labels_tgt = input_ids_tgt[1:] input_ids_tgt = input_ids_tgt[:-1] input_mask_src = [1] len(input_ids_src) input_mask_tgt = [1] len(input_ids_tgt) while creating tf record .

ishurironaldinho commented 5 years ago

I'm running your code on the CNN/Dailymail dataset.

However, training never end, displaying :

Batch #X

with X growing more and more. I waited a long time, then kill the process.

But now, when I run the inference code, produced summary is very bad. Example :

the two - year - year - year - old cate - old cat was found in the animal .

What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)

I run the inference code ,but i don't know how to produce the summary.

should i post the original story through the postman,so it will give back a summary???

thatianafernandes commented 5 years ago

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check .

Where exactly can I set that?

Vibha111094 commented 5 years ago

In config.py you would have lr = { 'learning_rate_schedule': 'constant.linear_warmup.rsqrt_decay.rsqrt_depth', 'lr_constant': 2 * (hidden_dim ** -0.5), 'static_lr': 1e-3, 'warmup_steps': 10000, } .
You could increase to around 15000-20000.

mishrachinmaya689 commented 5 years ago

When I put low numbers for steps =10 , warm up steps = 10 , max eval=10 iteration is still going 150+ for epoch 0. Could you help clarifying how those numbers are interlinked.

xieyxclack commented 5 years ago

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check . Also please make sure you are sending the delimiter ie [SEP] as an indicator to stop decoding ie labels_tgt = input_ids_tgt[1:] input_ids_tgt = input_ids_tgt[:-1] input_mask_src = [1] len(input_ids_src) input_mask_tgt = [1] len(input_ids_tgt) while creating tf record .

hello, I adopt the default setting and obtain ROUGE-1/2/L: 39.29/17.30/27.10. In fact the ROUGE-L result is terrible. I trained on 1 GPU for 3 days, total 17w steps with batch size = 32. Could you provide your results on CNN/Dailymail dataset, or do you know what is wrong? Many thanks!@Vibha111094

Shanzaay commented 4 years ago

I am following the default settings. But after the second epoch, it's taking too long. Does anyone else happen to face the same problem?