Mediacloud Assignment
Devansh Shah
IIIT Hyderabad.
Project: Experiment for assessing python candidate libraries for Author Extraction from online articles - Media Cloud assignment
Aim: Given a dataset of extracted articles and labeled Authors, assess the viability of two Python 3 libraries, namely Newspaper & Goose and provide a recomendation with baseline results.
Experiment - 1 (MVP): Install both Newspaper and Goose in a python3 virtualenv and parse a sample article to extract title & authore.
Results (MVP):
Both libraries are successfully able to fetch the article & parse it.
Both libraries are able to correctly parse the title & article text.
Goose is unable to parse the author.
Newspaper incorrectly parses the authors from the comments section.
Attached image from MVP
Experiment - 2 (Sample size: 186)
Method:
Evaluation:
Results:
Conclusion: