rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
527 stars 106 forks source link

program download method #31

Closed marshonhuckleberry closed 4 years ago

marshonhuckleberry commented 4 years ago

will the program check if the file already exists or it will download it anyway and if it exists it will replace it? its very important thing because it affects scraping time, bandwidth resource usage and spider detection, some websites detect if you scrape them if you download same files again and again

marshonhuckleberry commented 4 years ago

is good to keep low profile when scraping also wanted to ask about adding delays after each downloaded link to avoid detection

rajatomar788 commented 4 years ago

Yes it checks for existence and downloads only if missing. You can force it to download a file everytime it sees a link by adding over_write=True to global config .

rajatomar788 commented 4 years ago

Delays could be implemented easily by overridding the get method of pywebcopy.SESSION.