lancopku / Graph-to-seq-comment-generation

Code for the paper ``Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model''
174 stars 38 forks source link

question about the tokenizer and keywords extractor tool #13

Open jiangliqin opened 2 years ago

jiangliqin commented 2 years ago

Hi,I use the default jieba tokenizer tool and gensim/jieba keywords extractor tool to preprocess the corppus,but my result is not as good as you ,for example: mine:['杨清', '孩子', '网友', '母亲', '小孩', '失望透顶', '父母', '发消息'] your:[ "王乐乐", "杨清柠", "奶粉", "外孙", "分手", "孩子"]

could you explain the tokenizer and keywords extractor tool that you use for more detail?

yahiko-l commented 2 years ago

stop words??