Closed pkugyf closed 10 months ago
您好,感谢使用Data-Juicer。 主要参考BigScience 处理OSCAR的流程。
This issue is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this issue will be closed in 3 day.
Close this stale issue.
Before Asking 在提问之前
[X] I have read the README carefully. 我已经仔细阅读了 README 上的操作指引。
[X] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking 先搜索,再提问
Question
算子:perplexity_filter 看源码,这个算子用的语言模型是在 https://huggingface.co/edugp/kenlm 下载的,但这个模型的介绍页里只说用wiki之类的数据训练的,没说具体用了哪个模型进行训练,只是说一个用例用了西班牙语的bert模型 所以想问一下,用来计算中文和英文的ppl的模型是什么模型?
Additional 额外信息
No response