microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.13k stars 2.55k forks source link

How to reproduce Long Text Generation examples from paper? #51

Closed GenTxt closed 4 years ago

GenTxt commented 4 years ago

I would appreciate if another user could share the sample commands that can reproduce the interesting results included with the arvix paper. No success so far with original MS repo and only limited, repetitious output using the current huggingface unilm fork and same prompt.

Cheers

Seed line from 1984 used in original paper:

Winston sat back. A sense of complete helplessness had descended upon him.

Published output:

He was now helpless and unmoving. What would that mean? To become a monk under her mother’s rule. What would it mean for him to be put at the center of the most dangerous of all of demons, the demon known as Azrael? He would never know exactly what that was, or even how it would end, unless he could sense the strength of Azrael’s mind and soul. It wasn’t only the existence of his mother, it was all he knew, and every moment. He was trapped within her embrace, a trapped, torn, battered, crippling presence he only knew it would mean was eternity. And in the instant that he was lost his life changed forever, when Azraelis’ spirit touched his mind, he was engulfed in this dark and mystical, so enchanting and beautiful force.

donglixp commented 4 years ago

Hi @GenTxt ,

I ran a sampling using the same seed prompt. The results are as follows:

Winston sat back . A sense of complete helplessness had descended upon him . A deep breath of fear filled him . He was too late . In fact the last thing he had thought was the thought that he was doomed to this world , and no matter how soon , he would get to this . To escape this life , he was the only way to survive in this life . He wanted to be home , to find the others and return safely to earth , so that he could find the family of his great . Winston felt nothing wrong with his situation . He would be in no position to tell the other passengers at the boarding platforms that they would be rescued , and he could make certain that he never got to the ship . He would never take it to the safety of the vessel and not return . Once he had started his flight , he would be allowed to board the passenger compartment . Winston knew that if the ship got stuck in a mud lake , he would likely die there , which his life would be spared . He knew that he would not be able to escape the wreckage of ship without even getting through it or getting in the air currents of the mud . All of his thoughts had gone straight back to the day he had first dreamed up a life in this earth and so he would not ever regret this decision again . It was all just an accident and he didn ’ t need anything . So he was in a nominative position for the entire voyage on this ship but no one dared try anything in such a dire situation . He was just the one being held in his own , no matter how hard he tried . He had to be the one of the few people left that ever would see him go through this journey and get along with everyone . There would never be more sorrow on his shoulders to be brought back to life . Winston was still very angry that he had been sent over into the ocean for just this little time . He now felt that he should do nothing less to the fate of his children and siblings . Winston himself was not the kind of person to let anyone see how he felt . He was a young man , and he was a young orphan . Winston knew that he wasn ’ t the kind of person to put it off . It was just an accident and he was the type of person to take chances and make a mistake . He was too young for any of this to be his fault . There was no reason to blame his mistakes and he wanted to be on the ship , but he just couldn ’ t wait to start to be

We didn't implement the sampling function in the current huggingface unilm fork. Note that the sampled text would be different with different checkpoints. We use top-k (k=40) sampling algorithm to generate outputs. The flag --new_segment_ids is necessary to initialize the correct segment embedding (index=2) for L2R LM. We also need to sample the distribution using the masked-LM way.

GenTxt commented 4 years ago

Hello again.

Still have issue trying to generate similar output comparable to your interesting example.

Ubuntu 18.04 using default 'unilmv1-large-cased.bin' from github link with following commands:

python3 decode_seq2seq.py --bert_model bert-large-cased --model_recover_path storage/unilmv1-large-cased.bin --new_segment_ids --mode l2r --input_file test.txt --max_seq_length 256 --max_tgt_length 128 --batch_size 16 --forbid_duplicate_ngrams --temperature 1.0 --length_penalty 0 --min_len 500 --top_k 40 --output_file test_unilm_mode-l2r-topk-min_length500.txt

test.txt is single line from paper:

Winston sat back. A sense of complete helplessness had descended upon him.

Result: “ We ’ re going to be in the world ! ” He began to recount what he had learned . “ The world is a wreck , but here you stand in it , there are so many people who will not know it . And you are a damn coward , not to mention you are a little frightened , you are still scavenged . You are just a little unattracworthy that I ’ m going to take care of you . ” Worried that you were misunderstand , he shook his head . “ I mean , I have never taught you to learn the hard way , but if I

I've followed your instructions but can't find an option to "... sample the distribution using the masked-LM way."

Also, cannot find option to increase length of output comparable to your sample. --min_len 500 has no effect.

I assume I'm missing options in the above terminal commands?

Would appreciate if you could provide the missing options.

Cheers