Added a script which retrieves PDF files from the provided URLs. The given script successfully retrieves a single PDF file for all but 10 of the titles in the dataset. Most titles which it fails to retrieve require a paid subscription/ institutional access (e.g. Elsevier URLs), or the given URL is no longer active.
The script can be run via python get_pdfs.py from within the data subdirectory.
Titles/URLs which it fails to retrieve (most of these can be downloaded manually):
('Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks', 'https://doi.org/10.1162/neco_a_01164')
Added a script which retrieves PDF files from the provided URLs. The given script successfully retrieves a single PDF file for all but 10 of the titles in the dataset. Most titles which it fails to retrieve require a paid subscription/ institutional access (e.g. Elsevier URLs), or the given URL is no longer active.
The script can be run via
python get_pdfs.py
from within thedata
subdirectory.Titles/URLs which it fails to retrieve (most of these can be downloaded manually):