ipfs / infra

Tools and systems for the IPFS community
MIT License
133 stars 41 forks source link

CORS issues on preload nodes #447

Closed daviddias closed 5 years ago

daviddias commented 6 years ago

Hi Infra team, some of our users are reporting having multiple problems with CORS, can you confirm if this is a known problem?

Issue: https://github.com/ipfs/js-ipfs/issues/1476#issuecomment-434741386

alanshaw commented 6 years ago

This is still happening and can be see here https://klueq.github.io/ (inspect the browser console)

alanshaw commented 6 years ago

https://github.com/ipfs/js-ipfs/issues/1694

haadcode commented 6 years ago

Adding my two cents to double confirm we're seeing this issue also, described in https://github.com/ipfs/js-ipfs/issues/1694.

daviddias commented 6 years ago

Another report with example - https://discuss.ipfs.io/t/trouble-with-content-availability-on-public-gateways-from-js-ipfs/4137/1

I tested with Chrome and Firefox. Trimmed the test case to: https://codepen.io/anon/pen/jQqMPb?editors=0011

lidel commented 6 years ago

Preflight Issues

Preload nodes are sometimes slow to respond which translates to a persistent problem of preflight requests (HTTP OPTIONS) failing, that failure being cached by the browser (afaik in Firefox up to 24 hours or until browser is closed) and as a result every following preload request (HTTP GET to /api/v0/refs) is blocked by the browser:

2018-11-08--15-59-45

Preflight Fix?

@kyledrake what if we introduce an easy optimization: detect and return static response for HTTP OPTIONS requests at Nginx.

That way preflight requests will not fail due to go-ipfs being under load, because they will never hit the daemon.

Static response to copy (Access-Control* and Vary):

$ curl -X OPTIONS 'https://node0.preload.ipfs.io/api/v0/refs' -v
# (...)
< HTTP/1.1 200 OK
< Date: Thu, 08 Nov 2018 15:09:18 GMT
< Content-Length: 0
< Connection: keep-alive
< Vary: Origin
< Vary: Access-Control-Request-Method
< Vary: Access-Control-Request-Headers
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, POST, OPTIONS
< Access-Control-Allow-Headers: X-Requested-With, Range, Content-Range, X-Chunked-Output, X-Stream-Output
< Access-Control-Expose-Headers: Content-Range, X-Chunked-Output, X-Stream-Output

Gateway ←→ Preload

Note that even if preflight succeeds, public gateway does not load content – looks like it is missing direct connection. Would adding preload nodes as bootstrap servers for out gateway nodes fix the issue?

daviddias commented 6 years ago

@ipfs/infrastructure any update here?

kyledrake commented 6 years ago

@daviddias

I deployed new preload servers that put them on their own dedicated machines and brings them up to the latest version of IPFS, which improved things a lot. We also had someone flooding the preload server with several hundred requests per second, which was causing performance issues. I throttled it so that people can't just flood the server with API calls from a script.

I'm going to explore @lidel's idea here when I get a chance, but I'd like to keep this open until I deploy that.

mitra42 commented 6 years ago

If you've throttled it - totally reasonable - I hope that the JS-IPFS code has been written so that if preload is slow responding (throttled) that it won't wait for it before trying the local JS-IPFS (not JS-IPFS-API) call.

Note that it might not be a script thats doing a fast load, if we load a webpage then its going to try and read up to approx 50 images. In many cases those will be cached but sometimes the js-ipfs is going to be inserting these preload calls.

Note - its on our list to turn preload off in the dweb.archive.org code, but we haven't got to it yet.

lidel commented 5 years ago

@mitra42 afaik preload in js-ipfs is async and does not block regular operations:

Preload API requests are now done asynchronously so they don't effect the time it takes to add content. The preload module keeps track of in flight requests and cancels any that are still flying when stop is called on the node. – https://github.com/ipfs/js-ipfs/pull/1464#issuecomment-408111913

@kyledrake additional idea: setting an explicit timeout for caching preflight requests could limit the number of preflight requests for popular CIDs (spec says preflight cache is per URL) and ensure all vendors behave the same, eg. to set maximum possible cache of 10 minutes:

Access-Control-Max-Age: 600
Access-Control-Max-Age
Maximum number of seconds the results can be cached. Firefox caps this at 24 hours (86400 seconds) and Chromium at 10 minutes (600 seconds). Chromium also specifies a default value of 5 seconds. A value of -1 will disable caching, requiring a preflight OPTIONS check for all calls.

Refs:

mitra42 commented 5 years ago

Just confirming, I'm still seeing preload CORS errors

eefahy commented 5 years ago

paging @kyledrake

alanshaw commented 5 years ago

Reported here also https://github.com/ipfs/js-ipfs/issues/1732

kyledrake commented 5 years ago

There was a known issue with the AMS server which I just resolved, but the other servers should be operating as expected. The monitoring caught the AMS outage, but I didn't see issues with any of other servers.

Having all of the bootstrap servers go down at once is of course a major issue, but if it's one out of several, that shouldn't cause any major problems (sans the annoying console output). We're going to have random servers go down from time to time for various reasons.

I'm going to close this ticket, as the preload servers CORS is functioning properly. I am working on improving the infrastructure behind the current bootstrap system to make it faster and more reliable in the future, but that's work that's independent of this ticket (which is specifically about CORS issues).

mitra42 commented 5 years ago

I'm not clear if you fixed a CORS issue or something else?

If the CORS was functioning then one place I saw this on another system was that CORS was working correctly in the "normal" case but working incorrectly in the failure case, so errors were being returned with the wrong CORS and instead of displaying the actual error (e.g. a 500 server error) they were showing up as a CORS error.

lidel commented 5 years ago

@mitra42 are you able tell which API endpoint and path was missing CORS headers in HTTP 500 responses or preflight requests? Is it still reproducible or was happening randomly?

FWIW I checked preload call for invalid CID (returns HTTP 500) and CORS headers look fine right now:

$  curl -s -I -X OPTIONS 'https://node0.preload.ipfs.io/api/v0/refs?r=true&arg=INVALID_CID' | grep -i Access-Control 
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: X-Requested-With, Range, Content-Range, X-Chunked-Output, X-Stream-Output
Access-Control-Expose-Headers: Content-Range, X-Chunked-Output, X-Stream-Output

$ curl -s -I -X GET 'https://node0.preload.ipfs.io/api/v0/refs?r=true&arg=INVALID_CID' | grep -i Access-Control 
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: X-Requested-With, Range, Content-Range, X-Chunked-Output, X-Stream-Output
Access-Control-Expose-Headers: Content-Range, X-Chunked-Output, X-Stream-Output
mitra42 commented 5 years ago

Not sure what you mean by API endpoint, I see this in a browser log, and as mentioned by other reporters its intermittent - and I'm usually only looking at the browser log if Im looking for some other problem. If there is other info I can grab from you next time I see it in a log please let me know (here).

lidel commented 5 years ago

@mitra42 (endpoint == server with /api/v0/) I am mainly curious if CORS errors in console look like this or something else. Next time you see something just grab a screenshot and post here or in a new issue and @lidel me.

mitra42 commented 5 years ago

Ok - will do - do you mean something different from that posted in the bug report on js-ipfs#1476. There is a bunch of history to this problem there which of course didn't get copied over when this new one was created.