thu-coai / KdConv

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation
Apache License 2.0
459 stars 62 forks source link

Didn't find several files #8

Closed youngornever closed 3 years ago

youngornever commented 3 years ago

When I run the membertret, I dont find several files.

1、/home/zhengchujie/bert_torch/chinese_wwm_pytorch/bert_config.json&vocab.txt&pytorch_model.bin As a result, I download an alternative in https://github.com/ymcui/Chinese-BERT-wwm. image However, there are several warnings. INFO - pytorch_transformers.tokenization_utils Model name 'KdConv/benchmark/_bert_chinese_wwm_pytorch/vocab.txt' not found in model shortcut name list INFO - pytorch_transformers.tokenization_utils - Didn't find file /KdConv/benchmark/_bert_chinese_wwm_pytorch/added_tokens.json&special_tokens_map.json&tokenizer_config.json. We won't load it.

2、FileNotFoundError: [Errno 2] No such file or directory: '../data/resources/chinese_stop_words.txt' As a result, I git clone https://github.com/goto456/stopwords, and mv cn_stopwords.txt chinese_stop_words.txt.

Please give the corresponing url of those files.

Thanks

chujiezheng commented 3 years ago

Q1: I am not sure what your version of transformers is. The version used in our codes is an early version pytorch_pretrained_bert. Besides, what are your args settings? Saying what are the values of --bert_config_file, --vocab_file and --init_checkpoint?

Q2: It has been a long time since our experiments were conducted. I am sorry but i am not sure what stop words are used in our experiments. You can use any publicly available stop word vocab as you want.