ictnlp / DialoFlow

Code for ACL 2021 main conference paper "Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances".
MIT License
93 stars 10 forks source link

Can't load the model! #11

Open Evraa opened 2 years ago

Evraa commented 2 years ago

Greetings,

Actually I'm surprised that such an error came up, my problem lies with this line

model = torch.load("models/DialoFlow_large/model.bin")

model.bin is placed appropriately, and EC2 works with cuda 11.2 and pytorch = 1.9.

Where would the problem come from?

Thanks in advance

Evraa commented 2 years ago

Error:

Traceback (most recent call last): File "generate.py", line 283, in model = torch.load("models/DialoFlow_large/model.bin") File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/serialization.py", line 787, in _legacy_load result = unpickler.load() ModuleNotFoundError: No module named 'transformers.modeling_gpt2'

lizekang commented 2 years ago

You can try to lower the version of transformers. You can try transformers==3.1.0.

Evraa commented 2 years ago

Thank you for your fast reply.

It worked. But got stuck in another problem concerning versions, I guess.

Current versions: torch = 1.7.0 transformers = 3.1.0 pickle = 4.0 regex = 2.5.103

Error: Traceback (most recent call last): File "generate.py", line 283, in model = torch.load("model.bin") File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/serialization.py", line 595, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/serialization.py", line 774, in _legacy_load result = unpickler.load() ModuleNotFoundError: No module named '_regex'

Evraa commented 2 years ago

Please, don't refer me to issue #2 . It didn't work for me, also it's in Chinese, and I'm not familiar with the language :'D.

thank you

lizekang commented 2 years ago

You can try regex==2018.1.10. I should work.

Evraa commented 2 years ago

Thank you very much.

Could you please tell me how to structure test.refs.txt, is it just sentences separated by '\n' ? and what is the minimum/maximum number of utterances allowed?

thank you in advance

lizekang commented 2 years ago

The structure is like the following: utterance1 EOS utterance2 EOS utterance3 \t reference1 \t reference 2 \t reference 3 \t\n There is no specific constraint for minimum/maximum number of utterances.

Evraa commented 2 years ago

I'm sorry, what are "reference" ? Could you provide an example?

Evraa commented 2 years ago

Also, this error came up!

Traceback (most recent call last): File "generate.py", line 298, in hypstr = beam_search(history, tokenizer, model, args) File "generate.py", line 215, in beam_search delta = work_delta(model, conv_seq, sentence_idx, token_type_seq) File "generate.py", line 95, in work_delta conv_hidden_state = model.speak_model(conv_seq, token_type_ids=token_type_seq)[0] File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/modeling_gpt2.py", line 527, in forward return_dict = return_dict if return_dict is not None else self.config.use_return_dict File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/configuration_utils.py", line 219, in use_return_dict return self.return_dict and not self.torchscript AttributeError: 'GPT2Config' object has no attribute 'return_dict'

lizekang commented 2 years ago

In the dialogue dataset, there are many possible responses. The responses collected in advance are the references. For example: utterance1: What's your hobby? reference1: I like basketball. reference2: Reading. What about you? reference3: Tell me yours first.

lizekang commented 2 years ago

For the error, I never meet this. Maybe you can try downgrade the transformers to 3.0.0 or 2.7.0. I'm not sure.

Evraa commented 2 years ago

Worked with transformers version 3.0.0 Not 2.7.0 nor 3.1.0

One last question, what exactly does "generate.py" script produce? taking Dialogue and some reference responses .. what exactly is the output hypstr[0]?

thanks in advance

Evraa commented 2 years ago

Follow up question, the variable "responses" in "generate.py" ln: 293. What is the purpose of it?

dongqian0206 commented 2 years ago

Hi Zekang,

Thanks for providing the source code.

I just followed this post. Suppose I don't want to use the older version of transformers, due to python environment issue, then pre-training DialoFlow based on GPT-2 using DailyDialog is also possible, although the results would be worse than the reported ones. Right? The effectiveness of DialoFlow is independent of if it was pre-trained on Reddit dataset.

Thanks in advance.

Best,

Dong

dongqian0206 commented 2 years ago

Follow up question, the variable "responses" in "generate.py" ln: 293. What is the purpose of it?

Hi Evram,

Based on my understanding, the evaluation task is to generate a response given a context. For example,

context = [utterance1], response = [[utterance2], [utterance3], [utterance4], ....].

I have another question regarding the 'generation.py' script and to see if you are willing to answer it. Given speaker1's utterance, how do we know which utterances correspond to speaker2 and speaker1's next response? As it is multi-turn dialogue generation, I suppose the output should contain more than one utterance.

Please see the following example, where a context, ground-truth responses, and generated responses are shown. For the generated responses, the correspondence is not clear to me.

["We've managed to reduce our energy consumption in our factory by about 15 per cent in the last two years ."]
["That's excellent . How have you managed that ?", "Mainly because we've invested in a heat recovery system .", 'What does that mean exactly ?', 'Well , we use the exhaust gases from our printing presses to provide energy to heat our dryers .', 'What other sources of energy do you use ?', "We don't use any fossil fuels . Most of our power comes from hydro-electric plants . We're hoping to use even more energy from alternative sources in the future - perhaps even wind power ."]
["Does that mean that we can't afford to pay for more? We can't afford to pay for more than we can afford. Why not? We can't afford to pay for more. Why can't we? We can't afford to pay for more."]

As the input indices are [speaker1, text1, eos, empty, speaker2, text2, eos, empty], one potential way is to comment off the following code in the 'generation.py' script. But I am not sure if it is correct or not.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

Looking forward to hearing from you @Evraa, as well as @lizekang.

Best,

Dong

lizekang commented 2 years ago

Hi Zekang,

Thanks for providing the source code.

I just followed this post. Suppose I don't want to use the older version of transformers, due to python environment issue, then pre-training DialoFlow based on GPT-2 using DailyDialog is also possible, although the results would be worse than the reported ones. Right? The effectiveness of DialoFlow is independent of if it was pre-trained on Reddit dataset.

Thanks in advance.

Best,

Dong

Hi, sorry for the late response. The effectiveness of DialoFlow is independent. It should be also effective without pre-training on Reddit dataset.

dongqian0206 commented 2 years ago

Hi Zekang, Thanks for providing the source code. I just followed this post. Suppose I don't want to use the older version of transformers, due to python environment issue, then pre-training DialoFlow based on GPT-2 using DailyDialog is also possible, although the results would be worse than the reported ones. Right? The effectiveness of DialoFlow is independent of if it was pre-trained on Reddit dataset. Thanks in advance. Best, Dong

Hi, sorry for the late response. The effectiveness of DialoFlow is independent. It should be also effective without pre-training on Reddit dataset.

OK. Thanks a lot!

lizekang commented 2 years ago

Follow up question, the variable "responses" in "generate.py" ln: 293. What is the purpose of it?

Hi Evram,

Based on my understanding, the evaluation task is to generate a response given a context. For example,

context = [utterance1], response = [[utterance2], [utterance3], [utterance4], ....].

I have another question regarding the 'generation.py' script and to see if you are willing to answer it. Given speaker1's utterance, how do we know which utterances correspond to speaker2 and speaker1's next response? As it is multi-turn dialogue generation, I suppose the output should contain more than one utterance.

Please see the following example, where a context, ground-truth responses, and generated responses are shown. For the generated responses, the correspondence is not clear to me.

["We've managed to reduce our energy consumption in our factory by about 15 per cent in the last two years ."]
["That's excellent . How have you managed that ?", "Mainly because we've invested in a heat recovery system .", 'What does that mean exactly ?', 'Well , we use the exhaust gases from our printing presses to provide energy to heat our dryers .', 'What other sources of energy do you use ?', "We don't use any fossil fuels . Most of our power comes from hydro-electric plants . We're hoping to use even more energy from alternative sources in the future - perhaps even wind power ."]
["Does that mean that we can't afford to pay for more? We can't afford to pay for more than we can afford. Why not? We can't afford to pay for more. Why can't we? We can't afford to pay for more."]

As the input indices are [speaker1, text1, eos, empty, speaker2, text2, eos, empty], one potential way is to comment off the following code in the 'generation.py' script. But I am not sure if it is correct or not.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

Looking forward to hearing from you @Evraa, as well as @lizekang.

Best,

Dong

For the corresponding to different speakers, there are two ways: 1) we insert special tokens like speaker1 and speaker2. 2) we use different segment embeddings for different speakers (function build_input_from_input generate.py).

For this question, during the generation, we don't want to see the model generates special tokens inside the response.

# if o in [eos, empty, speaker1, speaker2]:
#     continue
dongqian0206 commented 2 years ago

Follow up question, the variable "responses" in "generate.py" ln: 293. What is the purpose of it?

Hi Evram, Based on my understanding, the evaluation task is to generate a response given a context. For example,

context = [utterance1], response = [[utterance2], [utterance3], [utterance4], ....].

I have another question regarding the 'generation.py' script and to see if you are willing to answer it. Given speaker1's utterance, how do we know which utterances correspond to speaker2 and speaker1's next response? As it is multi-turn dialogue generation, I suppose the output should contain more than one utterance. Please see the following example, where a context, ground-truth responses, and generated responses are shown. For the generated responses, the correspondence is not clear to me.

["We've managed to reduce our energy consumption in our factory by about 15 per cent in the last two years ."]
["That's excellent . How have you managed that ?", "Mainly because we've invested in a heat recovery system .", 'What does that mean exactly ?', 'Well , we use the exhaust gases from our printing presses to provide energy to heat our dryers .', 'What other sources of energy do you use ?', "We don't use any fossil fuels . Most of our power comes from hydro-electric plants . We're hoping to use even more energy from alternative sources in the future - perhaps even wind power ."]
["Does that mean that we can't afford to pay for more? We can't afford to pay for more than we can afford. Why not? We can't afford to pay for more. Why can't we? We can't afford to pay for more."]

As the input indices are [speaker1, text1, eos, empty, speaker2, text2, eos, empty], one potential way is to comment off the following code in the 'generation.py' script. But I am not sure if it is correct or not.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

Looking forward to hearing from you @Evraa, as well as @lizekang. Best, Dong

For the corresponding to different speakers, there are two ways: 1) we insert special tokens like speaker1 and speaker2. 2) we use different segment embeddings for different speakers (function build_input_from_input generate.py).

For this question, during the generation, we don't want to see the model generates special tokens inside the response.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

Thank you for your prompt reply and to see if I can explain my confusion clearly.

For the corresponding to different speakers, there are two ways: 1) we insert special tokens like speaker1 and speaker2. 2) we use different segment embeddings for different speakers (function build_input_from_input generate.py).

I agree on this part. During training, the input index is defined as [speaker1, text1, eos, empty, speaker2, text2, eos, empty], which shows the correspondence.

But it might be different from that during generation. For example, given the first utterance from speaker1, the input index [speaker1, text1, eos, empty] is provided, as well as the segment index.

Please see the segment result from your code, where '0' corresponds to speaker1 and '1' corresponds to speaker2. Longer outputs exhibit a similar pattern (just more 1's). tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

My confusion: as this task is multi-turn dialogue generation, the output is essentially one-round dialogue. The model doesn't know how to further switch the speaker identity, while it knows 0 -->1, as the information is provided by the user.

if len(conv) % 2 == 1:
   current_output = [speaker2]
else:
   current_output = [speaker1]

During training, ground-truth sequences are provided, while during generation, sequences are generated in an autoregressive manner.