weserv / images

Source code of wsrv.nl (formerly images.weserv.nl), to be used on your own server(s).
https://wsrv.nl/
BSD 3-Clause "New" or "Revised" License
1.86k stars 187 forks source link

Application returning 404 instead of 200 #326

Closed abargiela closed 2 years ago

abargiela commented 2 years ago

Hello, We self host the project but we noticed that the same problem we have on our self hosted project, we also have using your URL( https://images.weserv.nl), so I would like to understand if it's possible to fix that in a self hosted project, can you help me?

The URL we are trying to reach: https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/FWTCLBCR7QI6ZA6S3HNLBYR3PY.jpg&w=916

Using curl, we have 2 behaviours:

No cookies/no headers

$  curl -I "https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/FWTCLBCR7QI6ZA6S3HNLBYR3PY.jpg&w=916"
HTTP/2 302 
server: AkamaiGHost
content-length: 0
location: https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/FWTCLBCR7QI6ZA6S3HNLBYR3PY.jpg&w=916
expires: Wed, 15 Dec 2021 10:37:30 GMT
date: Wed, 15 Dec 2021 10:37:30 GMT
cache-control: no-transform, max-age=31536000
set-cookie: wp_ak_hpsw=0|20211124; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure
set-cookie: wp_ak_v_ot=1; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure
set-cookie: wp_ak_ot=1|20211012; max-age=5184000; path=/; domain=.washingtonpost.com; SameSite=None; secure
set-cookie: wp_geo=DE|BE|||EEA; max-age=3600; path=/; domain=.washingtonpost.com; SameSite=None; secure
set-cookie: wp_country=DE; max-age=3600; path=/; domain=.washingtonpost.com; SameSite=None; secure
content-security-policy: upgrade-insecure-requests

We notice that it returns 302, and I suspect that could be the reason why the application is returning 404.

With cookies/no headers

$ curl -I "https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/FWTCLBCR7QI6ZA6S3HNLBYR3PY.jpg&w=916"  -H 'Connection: keep-alive' -H 'Cookie: wp_ak_hpsw=0|20211124; wp_ak_v_ot=1; wp_ak_ot=1|20211012; wp_devicetype=0; rplampr=0a|20181213; wp_pwapi_ar="H4sIAAAAAAAA/03LsRWAMAgFwF2oLUQI8N2GJDKBXV52t/X6WwSEm7FoXoI56gk+W80cztyV7kUv3cQmHhKiEQzaBwU7FHCtnNmRY0qVS3mD9evX0OxsDojT3h8slUEebwAAAA=="; wp_ak_bt=1|20200518; wp_ak_bfd=1|20201222; wp_ak_tos=1|20211110; wp_ak_pp=1|20210310; wp_geo=DE|BE|||EEA; wp_country=DE; ak_bmsc=0D3FC54804B6F53011FA57FBCA7256FF~000000000000000000000000000000~YAAQ1/AWAliD6Z19AQAAmf5svQ5+k2YObckK+9rcfduSYCbV5oDaReDyFwqK5JgDwiTf6j0z6Pbaz14XHmhEPU9KZgF+lz42WLJ732ahe+iMXyEq+C0KRU9ipaY/CadtEjruDAOLTHUD/JTLaQb3NV2zeBtxmNQE4XrNVHIKAzgARNZuw4ZjYqqSi+Z2SvKkD4zNOOlS3h1DfPEUqSgO6Yh714W4ylSIJzHqmsFXsyzgn/yeDxlBgQ6prNL4+GJbgxLsI01+shyFcG2a9yM574I9n/cs4aoJGTl3Mu5Q4PkXhTnFcAepknLDH6LyhngSWup0kf3IQnOFb/DJm30ApVRVlLFEvJ/+PWNhJ3YMy/B9Ba7royuEgk9Jrs4aIR18ohJRJkyvcMjroNenW4
bXwxY9; bm_sv=0503D5EEA6806384815BC9C69BD494D0~cHVmaffrnKj1rXqh4LpA3GiKL/c+Zt7GniQa5/NLX60UnVyu549LCvI8QVtJqgXtQY3MdMjHFjlagCHut/YpqFTscUcvwUPltAxd/fMZPiMv6VxvNHLeJexjOQUNUGqVM7R6DGfym0QyeiZV4i/ZLViy9fLaLlVu+7qW
Qb1u6JE=; wp_usp=1---; bm_mi=C8371338D645382DB2D43B56181ED2BC~C8NCEJFni9hEjWcFJ4nZCCpS6XfxBdXCWkV+RhkX8LsxXEELY3ETJpOQbaJWzGeNhrmlS9uVasaH/lEvgy5QiP2u44QRplMOmajff0AAwGZ26XVyXe7S03Q1wW4msplW2+ObkUlz/I2ZXOD/78GIz
OExquvCDUZCIqw4w/jh2KfyayXCCaQrzvbBRmsZjnyh2NxgonXTQ8EDGFC6nZmcLJ3uzp7Bh3IdDb05ecISpgJxuvuR+h7uPD85Lk2UsLp5ij0xEIMrvwMdAcKZjHK6iA==; akaas_magnet-test=1642155554~rv=92~id=e7a10ef5eefb4b7c4f06cd0e58656cfc~rn=; rp
isb=rBEAA2G5wSMI0AAhcv3qAg==' 
HTTP/2 200 
content-type: image/jpeg
content-length: 2234379
last-modified: Tue, 14 Dec 2021 23:40:23 GMT
etag: "4b009f36c8408d8da4423a678abd5040"
x-amz-version-id: BHACO3NQ2c82u.z8m2Jim3gozKrpKYFV
accept-ranges: bytes
server: AmazonS3
x-amz-cf-pop: HAM50-C3
x-amz-cf-id: fOlCKccevhgpO2HxRbckEv8W2dBYqJWIiFpziQI0U2vVelLo2k38mQ==
expires: Thu, 15 Dec 2022 10:33:15 GMT
date: Wed, 15 Dec 2021 10:33:15 GMT
cache-control: no-transform, max-age=31536000
content-security-policy: upgrade-insecure-requests
set-cookie: ak_bmsc=0D3FC54804B6F53011FA57FBCA7256FF~000000000000000000000000000000~YAAQ1/AWAire6Z19AQAA7kKnvQ4dv9BLy3a9pBra328T7WIucNNKt+iFfoJoC5xX0BIfaKL7MT2trKR8DTEz87eendRsLwBG1zBKCki1K2f3Vg0ov7W7mhisFlqxNJwqFMmetauFeTdTY+B0ZL2baYtwBfU8tMWEyEOmm3vAX7YqDSOdQ0HcsiQ0cRUIEPGIBq2LAxlX2DD3MrvVi2/wGwkKvgd91mKlY3qYPwcN+j8aicgauWVNz2xmFk/usT+ywtOdBPEJPmNRfNOo6j8Iab0v5FpU51s0kxrHFFynbHptLyx7nNGuUpzUGGWdBfxLzSy3PXiL5FbJeWaoY/cA0iPapV4+ihoi8AgAm3sJbpcUswEvAHpOLXk6sJ8kiajkHXVwAgwJuK5VqiSPGAYfhxBWuMcPedFVrttVDZg=; Domain=.washingtonpost.com; Path=/; Expires=Wed, 15 Dec 2021 11:29:36 GMT; Max-Age=3381; HttpOnly

here we have 200 OK as return

Request using https://images.weserv.nl

$ curl -I "https://images.weserv.nl/?url=https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/FWTCLBCR7QI6ZA6S3HNLBYR3PY.jpg&w=916"
HTTP/2 404 
date: Wed, 15 Dec 2021 10:40:03 GMT
content-type: application/json
vary: Accept-Encoding
cross-origin-resource-policy: cross-origin
access-control-allow-origin: *
timing-allow-origin: *
x-images-api: 5
cf-cache-status: MISS
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=fPciVZnzrxt0VqdKnz7LQknrKCnB2UVARpV725byQbikDWfbvAtVTnjQCNdqv%2FMb%2B7vINuP3RKxxNkRpyebp0kuf2%2FmBxP7PIY8ZvREhZPvucU5NqJ5DCnEyBDzMIQYHYt5mcDUlcOWmSwVEqgwI"}],"group":"cf-nel","max_age":604800}
nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
server: cloudflare
cf-ray: 6bdf0d35bd704303-FRA
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400, h3-28=":443"; ma=86400, h3-27=":443"; ma=86400

It returned 404 but the expected behavior would be return 200 OK

Thank you in advance!

kleisauke commented 2 years ago

This fails because the upstream URL redirects to itself.

$ curl -I "https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/FWTCLBCR7QI6ZA6S3HNLBYR3PY.jpg&w=916"
location: https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/FWTCLBCR7QI6ZA6S3HNLBYR3PY.jpg&w=916

On Google Chrome, this will result in an ERR_TOO_MANY_REDIRECTS error message (try opening that URL in an incognito tab, for example). Our service will return a 404 with this error message:

{"status":"error","code":404,"message":"Will not follow a redirection to itself"}

(all upstream errors are remapped to 404, the JSON response always contains more details)

Browsers usually have a reload mechanism that tries to reload the page after giving that error message (since invalid cookies could cause this behavior). Our service doesn't have such mechanism and also intentionally ignores any cookies during redirects.

The reason that the upstream server is configured this way is presumably for hotlink protection. Note that our service is intended for caching and manipulating images, not for bypassing hotlink protection (that's also why we deliberately ignore all cookies).

abargiela commented 2 years ago

Thank you for your quick response @kleisauke