jeffheaton / article-code

23 stars 17 forks source link

the number of articles #2

Open eckolemon opened 4 years ago

eckolemon commented 4 years ago

Hi, I wonder how many articles have you extracted using this python file?

eckolemon commented 4 years ago

@jeffheaton

ianomad commented 4 years ago

@eckolemon

I ran this script over the english corpus dump of 2019/01/01 and got these results:

Total pages: 19,096,287
Template pages: 639,391
Article pages: 8,788,731
Redirect pages: 9,668,165
Elapsed time: 0:37:17.66