paul-tqh-nguyen / arxiv_as_a_newspaper

arxiv.org portrayed as if it were a news paper.
0 stars 0 forks source link

Robustify against "Article 1906.08163 doesn't exist" pages and the like #9

Closed paul-tqh-nguyen closed 5 years ago

paul-tqh-nguyen commented 5 years ago

See http://in.arxiv.org/abs/1906.08163

Make sure our ETL is robust against these cases.

Short repro form via extract_transform_utilities.py:

def main():
    # @todo remove all of the below once stability is reached.
    link_to_paper_page = "http://in.arxiv.org/abs/1906.08163"
    abstract = _abstract_text_from_arxiv_paper_url(link_to_paper_page)
    return None

does_not_exist_page

paul-tqh-nguyen commented 5 years ago

Progress Patch: https://github.com/paul-tqh-nguyen/arxiv_as_a_newspaper/commit/a41fbc2abfa57ac9baf3c3dde90ccf9d4a7bc6c6

This patch makes it so that we handle missing articles more gracefully.