uga-libraries / web-download

Download individual files from saved Archive-It crawls.
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Use python library instead of wget #3

Open amhanson9 opened 6 months ago

amhanson9 commented 6 months ago

Installing wget on MAGIL workstations was complicated due to them not having administrator access to their machines. Using the python requests or urllib libraries might be simpler.

amhanson9 commented 6 months ago

Original decision to use wget from command line was in part because could not get size confirmation or return code from importing wget and urllib to check for errors. We've done more with requests and the Archive-It APIs since then, so can probably get this to work.

amhanson9 commented 1 month ago

Possible option: https://trafilatura.readthedocs.io/en/latest/