hpcaitech / ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI
Apache License 2.0
334 stars 102 forks source link

wikiextractor raise BdbQuit #108

Closed RenyunLi0116 closed 2 years ago

RenyunLi0116 commented 2 years ago

🐛 Describe the bug

Hi All, When I run the code in language Bert # extractmodule wikiextractor --json enwiki-latest-pages-articles.xml.bz2 I got raise BdbQuit, this seems to be solved in here , by changing the version of wikiextractor to 3.0.4 But after that, the example code couldn't work due to 3.0.4 does not support --json

Environment

No response

FrankLeeeee commented 2 years ago

Hi, I have not met this issue before. Can you provide the versions of wikiextractor you tried?

RenyunLi0116 commented 2 years ago

Hi, I have not met this issue before. Can you provide the versions of wikiextractor you tried?

Hi, the original and default is 3.0.6, which would raise BdbQuit.

When use 3.0.4, this would be solved, but unable to --json since 3.0.4 doesn't support this command.

binmakeswell commented 2 years ago

Hi @saleelirenyun We have updated the preprocessing, you can try it, thanks. preprocessing