Added script which retrieves PDF files from the provided URLs

Added a script which retrieves PDF files from the provided URLs. The given script successfully retrieves a single PDF file for all but 10 of the titles in the dataset. Most titles which it fails to retrieve require a paid subscription/ institutional access (e.g. Elsevier URLs), or the given URL is no longer active.

The script can be run via python get_pdfs.py from within the data subdirectory.

Titles/URLs which it fails to retrieve (most of these can be downloaded manually):

('Adversarially Regularized Autoencoders', 'https://doi.org/10.1016/j.eswa.2019.04.014')
('Bilingually-constrained Synthetic Data for Implicit Discourse Relation Recognition', 'https://doi.org/10.1016/j.neucom.2017.02.084')
('Clustering Semi-Random Mixtures of Gaussians', 'http://proceedings.mlr.press/v80/awasthi18a.html')
('Convolutional Neural Network Language Models', 'https://doi.org/10.1016/j.patcog.2016.12.026')
('Differentially Private Ordinary Least Squares', 'https://doi.org/10.29012/jpc.654')
('Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks', 'https://doi.org/10.1162/neco_a_01164')
('Improving Semantic Parsing with Enriched Synchronous Context-Free Grammar', 'https://doi.org/10.1145/2963099')
('Learning and Memorization', 'https://doi.org/10.1016/j.procs.2010.12.025')
('Leveraging Well-Conditioned Bases: Streaming and Distributed Summaries in Minkowski p-Norms', 'http://proceedings.mlr.press/v80/cormode18a.html')
('Measuring abstract reasoning in neural networks', 'http://proceedings.mlr.press/v80/santoro18a.html')

levguy / talksumm

Added script which retrieves PDF files from the provided URLs #4