thu-coai / CrossWOZ

A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
Apache License 2.0
645 stars 114 forks source link

why not add span info for slots? #4

Closed lexmen318 closed 4 years ago

lexmen318 commented 4 years ago

Hi,

I wonder why not add span info for slots, just like MultiWOZ?

Thx a lot!

zqwerty commented 4 years ago

Because sometimes the value of the slot doesn't appear in the utterance:

  1. Request, NoOffer, Select, General intent
  2. hotel facilities slot for inform intent
  3. other reference issue or value normalization

Also, a simple string matching algorithm can help you get the span info easily. By the way, the original MultiWOZ doesn't have span info.

zqwerty commented 4 years ago

To get the span automatically, you can refer to https://github.com/thu-coai/CrossWOZ/blob/master/convlab2/nlu/jointBERT/crosswoz/preprocess.py

lexmen318 commented 4 years ago

OK. Thx a lot for your reply on time! However, I would like to share my understanding: 1) span info is very helpful for NLU module to catch attention to the slots; 2) for those slots that does not exactly equal to there appearance inside utterances, that's much more helpful, just like teaching a language learner to understand some words' real implication.

failable commented 3 years ago

Even some entities may not occur in the utterance, I agree with @lexmen318 that span info is even more important in this case.