nftstorage / nftstorage.link

🪐 NFT.Storage Gateway, the IPFS gateway for NFT.Storage is not "another gateway", but a caching layer for NFTs that sits on top of existing IPFS public gateways. ***Notice: Uploads have been decommissioned.**** Learn more and find a new hot storage provider for uploading new assets: nft.storage/nft-storage-classic
https://nft.storage/nft-storage-classic
Other
46 stars 10 forks source link

Range request for large file returns a 200. #178

Closed ikreymer closed 1 year ago

ikreymer commented 1 year ago

I've noticed a behavior where the range request for a large file instead returns a 200, and starts serving the entire file. This is less than ideal, as it may cause a very large download to start.

Ideally, the request can wait until the range request can be satisfied, or alternatively, some sort of error should be returned. Ex:

curl -v -r 29788705059-29788770616 https://bafybeic3w7mp6pteuvzjxnmdt65r6pttth6mzgcjfzna6fugb66xs2z3tq.ipfs.nftstorage.link/ > /dev/null

Observe that the response is a 200 response and curl attempts to download the entire file:

< HTTP/2 200 
vasco-santos commented 1 year ago

Hi @ikreymer Could you try again this? Probably was fixed by the latest changes we did in w3s.link?

I am getting 206:

< HTTP/2 206
< date: Mon, 05 Sep 2022 15:04:48 GMT
< content-type: application/zip
< content-length: 65558
< cf-ray: 745fdb9b5c6e4528-TXL
< access-control-allow-origin: *
< cache-control: public, max-age=29030400, immutable
< content-range: bytes 29788705059-29788770616/29788770617
< etag: "bafybeic3w7mp6pteuvzjxnmdt65r6pttth6mzgcjfzna6fugb66xs2z3tq"
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< cf-cache-status: DYNAMIC
< access-control-allow-headers: X-Requested-With, Range, Content-Range, X-Chunked-Output, X-Stream-Output
< access-control-allow-methods: GET
< access-control-allow-methods: GET, POST, OPTIONS
< access-control-expose-headers: Link
< content-security-policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' blob: data: https://*.githubusercontent.com; form-action 'self' ; navigate-to 'self'; connect-src 'self' https://polygon-rpc.com https://rpc.testnet.fantom.network
< server-timing: request;dur=17612
< timing-allow-origin: *
< x-dotstorage-resolution-id: https://ipfs.io
< x-dotstorage-resolution-layer: public-race
< x-ipfs-datasize: 29788770617
< x-ipfs-gateway-host: ipfs-bank13-fr2
< x-ipfs-lb-pop: gateway-bank2-fr2
< x-ipfs-path: /ipfs/bafybeic3w7mp6pteuvzjxnmdt65r6pttth6mzgcjfzna6fugb66xs2z3tq/
< x-ipfs-pop: ipfs-bank13-fr2
< x-ipfs-roots: bafybeic3w7mp6pteuvzjxnmdt65r6pttth6mzgcjfzna6fugb66xs2z3tq
< x-proxy-cache: MISS
< server: cloudflare
ikreymer commented 1 year ago

Just tried now, unfortunately, still getting a 200 on the URL above. Here's another example with a large CID:

curl -v -r bytes=29692787482-29692787511 https://bafybeihxwitolmu2r5kacgsojabadasnvpld27rxu2pvn4javll6vsqvjm.ipfs.w3s.link/

(We will likely have a workaround for this soon where won't need to do range requests, but still wanted to flag this)

vasco-santos commented 1 year ago

@ikreymer Could you get me the headers you get in the response?

vasco-santos commented 1 year ago

@ikreymer any update here? This is likely an issue with one of the gateways we use behind the scenes and would be great to know what/where to sort it out

ikreymer commented 1 year ago

@vasco-santos just tried the above curl command now:

< HTTP/2 200 
< date: Mon, 10 Oct 2022 15:11:45 GMT
< content-type: application/zip
< content-length: 29694233990
< cf-ray: 75804a3eda9296ad-SJC
< access-control-allow-origin: *
< cache-control: public, max-age=29030400, immutable
< etag: "bafybeihxwitolmu2r5kacgsojabadasnvpld27rxu2pvn4javll6vsqvjm"
< set-cookie: __cflb=02DiuEkP8hw3gxppKv9wxcHysW9bCMHdHJv2ko24NCrNx; SameSite=Lax; path=/; expires=Tue, 11-Oct-22 14:11:45 GMT; HttpOnly
< vary: Accept-Encoding
< cf-cache-status: DYNAMIC
< access-control-allow-headers: Content-Type
< access-control-allow-headers: Range
< access-control-allow-headers: User-Agent
< access-control-allow-headers: X-Requested-With
< access-control-allow-methods: GET
< access-control-expose-headers: Link
< server-timing: request;dur=3404
< x-cf-ipfs-cache-status: miss
< x-dotstorage-resolution-id: https://cf.dag.haus
< x-dotstorage-resolution-layer: public-race
< x-ipfs-path: /ipfs/bafybeihxwitolmu2r5kacgsojabadasnvpld27rxu2pvn4javll6vsqvjm/
< x-ipfs-root: bafybeihxwitolmu2r5kacgsojabadasnvpld27rxu2pvn4javll6vsqvjm
< x-ipfs-roots: bafybeihxwitolmu2r5kacgsojabadasnvpld27rxu2pvn4javll6vsqvjm
< server: cloudflare

also on subsequent attempts now getting 429s

ikreymer commented 1 year ago

Another recent example:

curl -v -r 19602680-19668237 "https://bafybeic4ic4gsd45mejopelizyvsq2ybicumtkmqkr6flm6x6kiwbovrc4.ipfs.w3s.link/webarchive.wacz" | wc -c

Should be getting last 65557 bytes, but ends up loading entire file. Tried from two different IPs

Response headers:

< HTTP/2 200 
< date: Tue, 25 Oct 2022 09:17:50 GMT
< content-type: application/zip
< content-length: 19668238
< cf-ray: 75f9dc7ebfbab894-AMS
< access-control-allow-origin: *
< cache-control: public, max-age=29030400, immutable
< etag: "bafybeiavsskuomwaaxeawi64t3pjs5yozsxt3a3itxeaefwazn4zxvvd5a"
< access-control-allow-methods: GET
< access-control-expose-headers: Link
< content-security-policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' blob: data: https://*.githubusercontent.com; form-action 'self' ; navigate-to 'self'; connect-src 'self' https://polygon-rpc.com https://rpc.testnet.fantom.network
< server-timing: request;dur=840
< x-dotstorage-anchor: 281d24ee4f605146a07d3a59aca5dd7a4d61de49eb5deda2d00236b5d12678c9
< x-dotstorage-resolution-id: https://freeway.dag.haus
< x-dotstorage-resolution-layer: dotstorage-race
< x-freeway-version: 1.5.2
< server: cloudflare
ikreymer commented 1 year ago

Update: above now working thanks to fix in freeway gateway

vasco-santos commented 1 year ago

It looks like all underlying gateways are now properly handling range requests and returning 200. Pinata and Cloudflare dedicate gateways are returning 206 on my local tests.

If you find any more issues in the wild with any response please let us know with response headers

vasco-santos commented 1 year ago

Closing the issue in the meantime