Open eckolemon opened 4 years ago
@jeffheaton
@eckolemon
I ran this script over the english corpus dump of 2019/01/01 and got these results:
Total pages: 19,096,287
Template pages: 639,391
Article pages: 8,788,731
Redirect pages: 9,668,165
Elapsed time: 0:37:17.66
Hi, I wonder how many articles have you extracted using this python file?