Open amhanson9 opened 6 months ago
Original decision to use wget from command line was in part because could not get size confirmation or return code from importing wget and urllib to check for errors. We've done more with requests and the Archive-It APIs since then, so can probably get this to work.
Possible option: https://trafilatura.readthedocs.io/en/latest/
Installing wget on MAGIL workstations was complicated due to them not having administrator access to their machines. Using the python requests or urllib libraries might be simpler.