jeff1evesque / ist-652

Syracuse IST-652 Final Project
1 stars 3 forks source link

Prevent invalid wikipedia article aggregation #20

Open jeff1evesque opened 6 years ago

jeff1evesque commented 6 years ago

We need to prevent invalid article names from being collected. A similar approach is to integrate the concepts of the invalid_articles.csv, in a more pythonic implementation.

jeff1evesque commented 6 years ago

This issue is partially invalid since the corresponding articles will likely fail to download, when executed via the wikipedia package. Additionally, removing the corresponding entries from the monthly top 1000 listing, will make more sense to be implemented when cleaning the data, rather than preventing the entries from being downloaded + extracted.