meteorhacks / cluster

Clustering solution for Meteor with load balancing and service discovery
https://meteorhacks.com/cluster-a-different-kind-of-load-balancer-for-meteor.html
MIT License
631 stars 80 forks source link

404s from /cluster-ddp #23

Open rantav opened 9 years ago

rantav commented 9 years ago

Hi, I'm using meteor-cluster in the following scenario:

Here's just one screenshot of the 404s on my ELB. http://cl.ly/image/17083P0j2x07 And I recorded the access logs, so here's an example request that 404s

2015-02-28T17:41:57.787697Z web 66.249.78.7:60042 172.31.38.90:80 0.00012 0.003284 0.000023 404 404 417 26 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/352/etyjoyb4/xhr_send HTTP/1.1"

Most of the /cluster-ddp calls end with 200 (or 204), but occasionally they 404. Here are just a few examples or 2xx lines:

2015-02-28T17:41:18.573685Z web 109.64.50.27:53781 172.31.12.222:80 0.000073 30.001697 0.000045 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/607/lqhhw6rh/xhr HTTP/1.1"
2015-02-28T17:41:49.567741Z web 109.64.50.27:53761 172.31.12.222:80 0.000116 0.004414 0.000032 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/607/lqhhw6rh/xhr_send HTTP/1.1"

As far as I can tell, this does not interfere in the sense that users are bothered by it, but it does make monitoring really hard (b/c it's hard to say what's a real error and that's just 404 from /cluster-ddp).

I wonder if you had seen this before or could guide me towards a solution?

Thanks!

arunoda commented 9 years ago

if you are getting this for every one hour like. That's possible. We've added one hour sticky cookie expiration time. With that we can get the equal load distribution, even if we have long connections. I think that's the issue. Do you want to change that.

How are you using ELB with cluster. I think ELB does load balancing. You can run without ELB.

If you need to SSL support, check Meteor Up. It has SSL support now.

rantav commented 9 years ago

Thanks @arunoda here's why I choose to use ELB:

The 404s are indeed an issue. I was probably getting them each hour or so - makes sense. I set up a cloudwatch alert for 4xx and they were so noisy I had to turn off this alert....

I might do some more testing later and decide to remove ELB (and add websockets and SSL termination on the app servers) but for now I think I'll stick to ELB.

arunoda commented 9 years ago

Yes. I can agree on your approach on selecting ELB. May be we could find a better way to handle that 404 issues. But, I fear it's possible or not.

On Mon, Mar 2, 2015 at 1:11 PM Ran Tavory notifications@github.com wrote:

Thanks @arunoda https://github.com/arunoda here's why I choose to use ELB:

  • Great SSL support. Easy to configure, many options
  • Offload from the servers regarding SSL termination. My app still isn't wildly enough used to make a diff but that's best practice for higher scale. I use VPC so there are no privacy concerns in the internal network.
  • Great out of the box monitoring and logging. ELB helps us monitor success/errors and alert easily using CloudWatch. Access logs are easily supported (a simple checkbox and you find them on S3)
  • Easy to get the status of which servers are operational and which aren't (healthcheck)
  • Easy and robust DNS setup
  • Pretty cheap...

The 404s are indeed an issue. I was probably getting them each hour or so

  • makes sense. I set up a cloudwatch alert for 4xx and they were so noisy I had to turn off this alert....

I might do some more testing later and decide to remove ELB (and add websockets and SSL termination on the app servers) but for now I think I'll stick to ELB.

— Reply to this email directly or view it on GitHub https://github.com/meteorhacks/cluster/issues/23#issuecomment-76669662.

rantav commented 9 years ago

not sure I entirely understand the implementation of cookie expiration, but perhaps a 302 might be better suited in this case?

arunoda commented 9 years ago

Actually, it's not like that. We do expire the cookie after one hour. Then the cluster sets a new cookie by pointing to a new backend.

So, the existing sockjs connection(via long polling) tries to connect to that. Then that server do a 404 since it's doesn't know about that.

That's where the issue comes in. May be I can do something about this and gracefully disconnect the sockjs connection. Let's see.

rantav commented 9 years ago

Lately I've also started seeing 500s from cluster-ddp. They don't happen as frequent as the 404s, so far I've only had 3, but I wonder if they could be related to the same topic.

Example:

2015-03-04T11:32:08.701675Z web 192.114.87.2:15735 172.31.38.90:80 0.000107 15.005573 0.000044 500 500 0 30 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/863/6u9k57nj/xhr_send HTTP/1.1"

Here it is again (the 500) with a little more context before/after: (it's in line 10)

2015-03-04T11:31:39.012386Z web 192.114.87.2:41721 172.31.38.90:80 0.000116 29.91368 0.000048 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/977/mfwo48cy/xhr HTTP/1.1"
2015-03-04T11:32:09.595634Z web 79.180.60.250:61607 172.31.12.222:80 0.000162 0.003569 0.000022 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/549/sat7g6th/xhr_send HTTP/1.1"
2015-03-04T11:31:46.487161Z web 212.29.231.193:41189 172.31.38.90:80 0.000086 30.001177 0.000044 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/030/w2ye9a4t/xhr HTTP/1.1"
2015-03-04T11:32:17.536381Z web 212.29.231.193:41189 172.31.38.90:80 0.000162 0.001666 0.000022 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/030/w2ye9a4t/xhr_send HTTP/1.1"
2015-03-04T11:31:47.722460Z web 81.218.63.74:58930 172.31.38.90:80 0.000081 29.99772 0.000042 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/823/icjyf_ms/xhr HTTP/1.1"
2015-03-04T11:32:18.727233Z web 81.218.63.74:58930 172.31.38.90:80 0.00008 0.001166 0.000024 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/823/icjyf_ms/xhr_send HTTP/1.1"
2015-03-04T11:32:18.903316Z web 79.176.19.26:53774 172.31.12.222:80 0.000104 0.005701 0.000024 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/853/w_bar1wj/xhr_send HTTP/1.1"
2015-03-04T11:31:51.578326Z web 82.166.77.214:61692 172.31.12.222:80 0.000071 30.000007 0.000052 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/061/i1n64r4p/xhr HTTP/1.1"
2015-03-04T11:32:22.574103Z web 82.166.77.214:61692 172.31.12.222:80 0.00011 0.003304 0.000034 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/061/i1n64r4p/xhr_send HTTP/1.1"
2015-03-04T11:32:22.908957Z web 82.166.86.27:54105 172.31.12.222:80 0.000119 0.003102 0.000023 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/977/h7wqgtm0/xhr_send HTTP/1.1"
2015-03-04T11:32:08.701675Z web 192.114.87.2:15735 172.31.38.90:80 0.000107 15.005573 0.000044 500 500 0 30 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/863/6u9k57nj/xhr_send HTTP/1.1"
2015-03-04T11:31:54.360001Z web 195.62.30.20:58776 172.31.38.90:80 0.000137 30.005048 0.000042 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/364/i31s765k/xhr HTTP/1.1"
2015-03-04T11:32:24.827538Z web 75.101.220.64:45492 172.31.12.222:80 0.000047 0.009047 0.00007 302 302 0 0 "GET http://gormim.com:80/ HTTP/1.1"
2015-03-04T11:32:25.363464Z web 195.62.30.20:58776 172.31.38.90:80 0.000121 0.002839 0.000024 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/364/i31s765k/xhr_send HTTP/1.1"
2015-03-04T11:32:26.039644Z web 192.114.87.2:9599 172.31.38.90:80 0.000129 0.001308 0.000027 200 200 0 151 "GET https://gormim.com:443/sockjs/info?cb=kk5f5n_em7 HTTP/1.1"
2015-03-04T11:32:26.709316Z web 192.114.87.2:9599 172.31.38.90:80 0.000119 0.002627 0.000027 200 200 0 2 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/503/7qgeo5vw/xhr HTTP/1.1"
2015-03-04T11:31:56.763520Z web 79.180.60.250:61741 172.31.12.222:80 0.000093 29.999052 0.00004 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/538/wsqktmrt/xhr HTTP/1.1"
2015-03-04T11:31:57.022781Z web 62.90.205.178:57578 172.31.38.90:80 0.000126 30.074405 0.000033 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/617/t_b9tsn2/xhr HTTP/1.1"
2015-03-04T11:31:57.130393Z web 212.199.69.1:19111 172.31.12.222:80 0.000099 30.0826 0.000043 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/130/nl1lfiff/xhr HTTP/1.1"
arunoda commented 9 years ago

I hope this is the same context. I will make a fix for this with the next release. Coming on next week. On 2015 මාර්තු 4, බදාදා at ප.ව. 5.49 Ran Tavory notifications@github.com wrote:

Lately I've also started seeing 500s from cluster-ddp. They don't happen as frequent as the 404s, so far I've only had 3, but I wonder if they could be related to the same topic.

Example:

2015-03-04T11:32:08.701675Z web 192.114.87.2:15735 172.31.38.90:80 0.000107 15.005573 0.000044 500 500 0 30 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/863/6u9k57nj/xhr_send HTTP/1.1"

Here it is again (the 500) with a little more context before/after: (it's in line 10)

2015-03-04T11:31:39.012386Z web 192.114.87.2:41721 172.31.38.90:80 0.000116 29.91368 0.000048 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/977/mfwo48cy/xhr HTTP/1.1" 2015-03-04T11:32:09.595634Z web 79.180.60.250:61607 172.31.12.222:80 0.000162 0.003569 0.000022 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/549/sat7g6th/xhr_send HTTP/1.1" 2015-03-04T11:31:46.487161Z web 212.29.231.193:41189 172.31.38.90:80 0.000086 30.001177 0.000044 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/030/w2ye9a4t/xhr HTTP/1.1" 2015-03-04T11:32:17.536381Z web 212.29.231.193:41189 172.31.38.90:80 0.000162 0.001666 0.000022 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/030/w2ye9a4t/xhr_send HTTP/1.1" 2015-03-04T11:31:47.722460Z web 81.218.63.74:58930 172.31.38.90:80 0.000081 29.99772 0.000042 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/823/icjyf_ms/xhr HTTP/1.1" 2015-03-04T11:32:18.727233Z web 81.218.63.74:58930 172.31.38.90:80 0.00008 0.001166 0.000024 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/823/icjyf_ms/xhr_send HTTP/1.1" 2015-03-04T11:32:18.903316Z web 79.176.19.26:53774 172.31.12.222:80 0.000104 0.005701 0.000024 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/853/w_bar1wj/xhr_send HTTP/1.1" 2015-03-04T11:31:51.578326Z web 82.166.77.214:61692 172.31.12.222:80 0.000071 30.000007 0.000052 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/061/i1n64r4p/xhr HTTP/1.1" 2015-03-04T11:32:22.574103Z web 82.166.77.214:61692 172.31.12.222:80 0.00011 0.003304 0.000034 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/061/i1n64r4p/xhr_send HTTP/1.1" 2015-03-04T11:32:22.908957Z web 82.166.86.27:54105 172.31.12.222:80 0.000119 0.003102 0.000023 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/977/h7wqgtm0/xhr_send HTTP/1.1" 2015-03-04T11:32:08.701675Z web 192.114.87.2:15735 172.31.38.90:80 0.000107 15.005573 0.000044 500 500 0 30 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/863/6u9k57nj/xhr_send HTTP/1.1" 2015-03-04T11:31:54.360001Z web 195.62.30.20:58776 172.31.38.90:80 0.000137 30.005048 0.000042 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/364/i31s765k/xhr HTTP/1.1" 2015-03-04T11:32:24.827538Z web 75.101.220.64:45492 172.31.12.222:80 0.000047 0.009047 0.00007 302 302 0 0 "GET http://gormim.com:80/ HTTP/1.1" 2015-03-04T11:32:25.363464Z web 195.62.30.20:58776 172.31.38.90:80 0.000121 0.002839 0.000024 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/364/i31s765k/xhr_send HTTP/1.1" 2015-03-04T11:32:26.039644Z web 192.114.87.2:9599 172.31.38.90:80 0.000129 0.001308 0.000027 200 200 0 151 "GET https://gormim.com:443/sockjs/info?cb=kk5f5n_em7 HTTP/1.1" 2015-03-04T11:32:26.709316Z web 192.114.87.2:9599 172.31.38.90:80 0.000119 0.002627 0.000027 200 200 0 2 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/503/7qgeo5vw/xhr HTTP/1.1" 2015-03-04T11:31:56.763520Z web 79.180.60.250:61741 172.31.12.222:80 0.000093 29.999052 0.00004 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/538/wsqktmrt/xhr HTTP/1.1" 2015-03-04T11:31:57.022781Z web 62.90.205.178:57578 172.31.38.90:80 0.000126 30.074405 0.000033 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/617/t_b9tsn2/xhr HTTP/1.1" 2015-03-04T11:31:57.130393Z web 212.199.69.1:19111 172.31.12.222:80 0.000099 30.0826 0.000043 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/130/nl1lfiff/xhr HTTP/1.1"

— Reply to this email directly or view it on GitHub https://github.com/meteorhacks/cluster/issues/23#issuecomment-77148218.

rantav commented 9 years ago

This is what I see on the app server logs when the 500 happens:

Cluster: web proxy error:  socket hang up
arunoda commented 9 years ago

That means somehow socket connection goes down. It can be happened due to a lot of reasons. Either network issue or something else.

But if you see this a lot of time in the log, we may need to look at it.

On Wed, Mar 4, 2015 at 6:19 PM Ran Tavory notifications@github.com wrote:

This is what I see on the app server logs when the 500 happens:

Cluster: web proxy error: socket hang up

— Reply to this email directly or view it on GitHub https://github.com/meteorhacks/cluster/issues/23#issuecomment-77151816.

rantav commented 9 years ago

So far I've seen around 3-4 500s today. The 404s happen much more

arunoda commented 9 years ago

Okay. May be that's happening when you are deploying. Not sure though.

On Wed, Mar 4, 2015 at 6:29 PM Ran Tavory notifications@github.com wrote:

So far I've seen around 3-4 500s today. The 404s happen much more

— Reply to this email directly or view it on GitHub https://github.com/meteorhacks/cluster/issues/23#issuecomment-77153199.

rantav commented 9 years ago

it doesn't happen when deploying, sorry (I guess it might also happen when deploying but not at the times I spotted it).

arunoda commented 9 years ago

Check now. I've fixed some related issues.

rantav commented 9 years ago

I am sorry, I don't think it fixed it, I still see a large number of 404s after upgrading to cluster 1.6.1.

Here's just a snippet in case it helps (search for the 404s there)

2015-03-16T08:36:38.211468Z web 66.249.78.42:36520 172.31.12.222:80 0.000128 0.002606 0.000023 404 404 81 26 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/651/csjmnxi5/xhr_send HTTP/1.1"
2015-03-16T08:36:38.849667Z web 66.249.78.35:57236 172.31.12.222:80 0.000122 0.002842 0.000022 404 404 81 26 "POST https://gormim.com:443/cluster-ddp/8dd67da794e9e51dda90f220ceba217f5977caf0/web/772/cck54mx0/xhr_send HTTP/1.1"
2015-03-16T08:36:39.498091Z web 66.249.78.35:57236 172.31.38.90:80 0.000124 0.003312 0.000024 200 200 0 2 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/803/izxvv74e/xhr HTTP/1.1"
2015-03-16T08:36:40.160310Z web 66.249.78.42:36520 172.31.38.90:80 0.000109 0.004714 0.000033 204 204 81 0 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/803/izxvv74e/xhr_send HTTP/1.1"
2015-03-16T08:36:40.870099Z web 66.249.78.42:36520 172.31.38.90:80 0.000103 0.006063 0.00003 204 204 528 0 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/803/izxvv74e/xhr_send HTTP/1.1"
2015-03-16T08:36:41.522892Z web 66.249.78.35:57236 172.31.38.90:80 0.000117 0.003646 0.000023 200 200 0 2 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/372/qtagwx45/xhr HTTP/1.1"
2015-03-16T08:36:42.221565Z web 66.249.78.42:36520 172.31.38.90:80 0.000111 0.004817 0.000024 204 204 81 0 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/372/qtagwx45/xhr_send HTTP/1.1"
2015-03-16T08:36:42.929253Z web 66.249.78.42:36520 172.31.12.222:80 0.000115 0.001515 0.000021 404 404 528 26 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/372/qtagwx45/xhr_send HTTP/1.1"
2015-03-16T08:36:13.489093Z web 82.81.163.130:46969 172.31.38.90:80 0.00012 30.007951 0.000043 200 200 0 24 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/208/u4rlywdl/xhr HTTP/1.1"
2015-03-16T08:36:43.736490Z web 66.249.78.49:50136 172.31.12.222:80 0.000107 0.001347 0.000027 200 200 0 2 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/513/j2yq0w4h/xhr HTTP/1.1"
2015-03-16T08:36:44.397051Z web 66.249.78.49:50136 172.31.12.222:80 0.000115 0.002862 0.000025 204 204 81 0 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/513/j2yq0w4h/xhr_send HTTP/1.1"
2015-03-16T08:36:44.453217Z web 82.81.163.130:46969 172.31.38.90:80 0.000089 0.002771 0.000027 204 204 22 0 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/208/u4rlywdl/xhr_send HTTP/1.1"
2015-03-16T08:36:45.137102Z web 66.249.78.49:50136 172.31.38.90:80 0.000115 0.002915 0.000022 404 404 528 26 "POST https://gormim.com:443/cluster-ddp/f4a987ea4fac27bdc2f00df9dd01a691e23f29b8/web/513/j2yq0w4h/xhr_send HTTP/1.1"
zapaz commented 8 years ago

Hi all

We got same problem

It seems that :

We did rebuid all apps with exactly the same git commit , and no more 404 !

So this problem must not appear with automatic deployments...