The consequence is that some images are missing in the ZIM (688 out of ~ 15k, 4%, not negligible).
In local tests with curl, it looks like passing User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:132.0) Gecko/20100101 Firefox/132.0 as header is sufficient to not (immediately?) trigger Cloudfront protections.
When trying to download images from flexbooks.ck12.org, the scraper is denied access, due to a Cloudfront WAF.
E.g. https://flexbooks.ck12.org/flx/show/THUMB_POSTCARD/image/user%3AY2sxMnNjaWVuY2VAY2sxMi5vcmc./98045-1359163835-22-2-IntPhysC-05-03-Weather-satellite.jpg redirects to https://dr282zn36sxxg.cloudfront.net/datastreams/f-d%3A0e28b5bb5ad0f030c1a8be7f2a189afc410f6a7e4f7ddd541706304e%2BIMAGE_THUMB_POSTCARD_TINY%2BIMAGE_THUMB_POSTCARD_TINY.1
The consequence is that some images are missing in the ZIM (688 out of ~ 15k, 4%, not negligible).
In local tests with curl, it looks like passing
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:132.0) Gecko/20100101 Firefox/132.0
as header is sufficient to not (immediately?) trigger Cloudfront protections.