mattnmorgan / ECU-19-Redis

NSF Grant project for for the CSCI department chair
0 stars 0 forks source link

Tabulate English Corpa Documents #3

Closed mattnmorgan closed 5 years ago

mattnmorgan commented 5 years ago
mattnmorgan commented 5 years ago

The initial starting size of the Gutenberg corpora was 51,475 files at 18.5 GB of disk space. After removing foreign language files (inclusive of files that contain English alongside other languages), there were 47,487 files remaining at 16.9 GB.

This is the answer that was desired, and thus this issue is complete.