levguy / talksumm

TalkSumm - Scientific Paper Summarization Based on Conference Talks
GNU General Public License v3.0
43 stars 8 forks source link

Added script which retrieves PDF files from the provided URLs #4

Closed TGoldsack1 closed 3 years ago

TGoldsack1 commented 3 years ago

Added a script which retrieves PDF files from the provided URLs. The given script successfully retrieves a single PDF file for all but 10 of the titles in the dataset. Most titles which it fails to retrieve require a paid subscription/ institutional access (e.g. Elsevier URLs), or the given URL is no longer active.

The script can be run via python get_pdfs.py from within the data subdirectory.

Titles/URLs which it fails to retrieve (most of these can be downloaded manually):