关键词提取type=‘keywords’，设置bylines=true无效的疑问

qinwf / jiebaR

Chinese text segmentation with R. R语言中文分词（文档已更新 🎉 ：https://qinwenfeng.com/jiebaR/ )

Other

344 stars 108 forks source link

library(jiebaR) cutter=worker(type='keywords',user = 'D:/R/soft/library/jiebaRD/dict/usrdic_20161102.utf8', stop_word = 'D:/R/soft/library/jiebaRD/dict/stop_words.utf8', ,bylines = TRUE) 563.482 518.433 208.951 199.566 190.731 "360" "手机" "数据线" "差评" "客服"

出来的结果是整个文档的关键词，如果想提取每行的关键词该怎么设置？另,如果设置type='mix 该怎么过滤掉停用词？以下是自己尝试过滤的code，但是貌似没有效果，请帮忙修改多谢 removewords <- function(target_words,stop_words){ target_words = target_words[target_words%in%stop_words==FALSE] return(target_words) }

stopwd=readLines('D:/R/soft/library/jiebaRD/dict/stop_words.utf8',encoding = 'UTF-8') class(stopwd) [1] character content3=sapply(content2,FUN = removewords,stopwd)

qinwf / jiebaR

关键词提取type=‘keywords’，设置bylines=true无效的疑问 #44