salesforce / simpletod

Official repository for "SimpleTOD: A Simple Language Model for Task-Oriented Dialogue"
https://arxiv.org/abs/2005.00796
BSD 3-Clause "New" or "Revised" License
235 stars 79 forks source link

issues on dst #5

Open smartyfh opened 3 years ago

smartyfh commented 3 years ago

Hi,

when evaluating the JGA for DST, did you remove both the none slot and dontcare slot?

When I ran the dialogue_generation.py, it seems that the generated belief states are always empty in the MODEL_OUTPUT file. so could you please provide more details about how the model is trained for DST?

Thanks!

ehosseiniasl commented 3 years ago

Hi,

We are only removing none. it is fixed now. On dialogue_generation.py, this seems a parsing issue. We will fix this

fasterbuild commented 3 years ago

@smartyfh empty belief issue: in generate_dialogue.py file, add text=text.strip() before tokenizer.encode(text), or there is always an space at the end. However I didn't get the joint acc in the paper, will you?

smartyfh commented 3 years ago

@smartyfh empty belief issue: in generate_dialogue.py file, add text=text.strip() before tokenizer.encode(text), or there is always an space at the end. However I didn't get the joint acc in the paper, will you?

@fasterbuild Have you checked all the checkpoints or just one checkpoint? If ignoring both none and dontcare slots, the results should be reproducible. However, if keeping the dontcare slots, the acc would go down several points. But this needs the author to confirm.

gungui98 commented 3 years ago

@smartyfh I am struggling in reproducing the result, could you please share the hyper-params you use for the training?

ShaneTian commented 3 years ago

@smartyfh empty belief issue: in generate_dialogue.py file, add text=text.strip() before tokenizer.encode(text), or there is always an space at the end. However I didn't get the joint acc in the paper, will you?

@fasterbuild Have you checked all the checkpoints or just one checkpoint? If ignoring both none and dontcare slots, the results should be reproducible. However, if keeping the dontcare slots, the acc would go down several points. But this needs the author to confirm.

If keeping the dontcare slots, what is the JGA you get?

HuangLK commented 3 years ago

@smartyfh empty belief issue: in generate_dialogue.py file, add text=text.strip() before tokenizer.encode(text), or there is always an space at the end. However I didn't get the joint acc in the paper, will you?

@fasterbuild Have you checked all the checkpoints or just one checkpoint? If ignoring both none and dontcare slots, the results should be reproducible. However, if keeping the dontcare slots, the acc would go down several points. But this needs the author to confirm.

If keeping the dontcare slots, what is the JGA you get?

The JGA is 50.32% if keeping the dontcare. After ignoring both none and dontcare, I can achieve 55.45% JGA.

libing125 commented 3 years ago

I got 50.46 joint accuracy, keeping dontcare and doing default cleaning.