Open jeff1evesque opened 6 years ago
This issue is partially invalid
since the corresponding articles will likely fail to download, when executed via the wikipedia
package. Additionally, removing the corresponding entries from the monthly top 1000 listing, will make more sense to be implemented when cleaning the data, rather than preventing the entries from being downloaded + extracted.
We need to prevent invalid article names from being collected. A similar approach is to integrate the concepts of the
invalid_articles.csv
, in a more pythonic implementation.