internetarchive / Zeno

State-of-the-art web crawler 🔱
GNU Affero General Public License v3.0
83 stars 11 forks source link

Add Amazon S3 extractor #153

Closed CorentinB closed 1 month ago

CorentinB commented 1 month ago

This PR adds extraction of files & subdirectories from S3 bucket.

The extractor triggers if the Server header of a response is AmazonS3, we should add more values to trigger on other providers.