pgte / fugue

Unicorn for node.js
MIT License
396 stars 14 forks source link

USR2 signal doesn't seem to work #8

Closed seanhess closed 13 years ago

seanhess commented 13 years ago

Great work so far. It's almost perfect :) I've tested just about everything and seen two issues. First, sending the master process a USR2 doesn't seem to restart the app at all, but instead seems to make it spawn an extra worker without killing the old one.

Here's my fugue file. apiweb.js is an express server (inherits from standard http server). https://gist.github.com/797489

if I then run "ps aux | grep node" I'll see that the master restarts itself or something (it's pid changes), as well as a new worker. After about 30 seconds (I have worker_kill_timeout set to 10 seconds), I lose the original worker, but requests to my server just time out.

On a similar note, if I kill the master process, the children don't ever get killed. I have to kill them by hand.

One other thing I didn't have time to finish investigating: when hitting my webserver HARD (as fast as I can from my local computer), the master process gobbles up 100% of one of my cores, while the workers sit around 20% at most. Is that expected?

Feel free to split these up however you like. I'm looking forward to using fugue!

seanhess commented 13 years ago

"One other thing I didn't have time to finish investigating: when hitting my webserver HARD (as fast as I can from my local computer), the master process gobbles up 100% of one of my cores, while the workers sit around 20% at most. Is that expected?"

This part was a false alarm. I wasn't looking at the data correctly.

seanhess commented 13 years ago

Oh, and here's my info:

node v0.2.6 fugue 0.0.38 os x

pgte commented 13 years ago

Hi seanhess,

Thanks for your feedback.

Unfortunately I can't replicate the behavior you describe.

I did an experiment with node v0.2.6 and fugue 0.0.38, on osx: https://gist.github.com/798291

The startup app.js is pretty similar to your file. test.js exports an express server.

  1. $ node app.js
  2. Opened a browser and pointed it to 6001, made some requests
  3. $ kill -USR2 cat /tmp/apiweb.pid
  4. 30 seconds later the worker dies
  5. Made some more requests, all fine

Notice that the worker stays around for 30 seconds more because it has pending TCP connections (HTTP keep-alive). If you close the broswer (cmd-q) the worker dies instantly. Anyway, I can still make successful browser requests after that.

If you can make a gist without any of your app-specific code that replicates the problem I would appreciate it.

Thanks!

-Pedro

maritz commented 13 years ago

Oh, so switching off keep-alive in development environments would be a good idea here? I was wondering why my workers took so long to die and used the "hold-f5-until-worker-dies" method, which works but is obviously a little annoying. :D

pgte commented 13 years ago

Hi Moritz,

Yes, that makes some sense for HTTP apps. If you want, you can create the feature request on fugue issues :)

Fugue is not really designed for development environments. A quick fix would be to only call fugue on your staging and production environments, and listen directly on the port when in development mode. Like this you can even use tools like nodemon to restart node whenever you change a source file.

Thoughts?

maritz commented 13 years ago

That's actually exactly what I'm doing right now. (except that i don't have a staging or production environment that i really use right now :D )

I think the way it is now is perfect. The only thing that could be changes is a note in the README about the keep-alive. (it makes sense if you know it, but I sure didn't think about it that way.)

pgte commented 13 years ago

Done! :)

seanhess commented 13 years ago

Here you go!

https://gist.github.com/804464

It's a shell script that writes the files you need. You'll see that the last curl call never comes back with anything. Once we fix that, I can finish the test to show it gets the 2nd version after 30 seconds.

Thanks!

pgte commented 13 years ago

So strange, I tested with node v0.2.6 and fugue@0.0.38 and the output is:


Starting with 'Hello World!' Hello World! Starting with 'Hello World - NEW VERSION!' This never returns...

Sleeping for 5 seconds and exiting. Also, the worker process never dies after the SIGINT

Hello World - NEW VERSION!

So it looks like I cannot replicate your problem, very strange...

seanhess commented 13 years ago

Weird. It seems to work on a centos5.5 VM. What a waste of time, I should have been testing on linux all along. It's not like I need to deploy to my local computer or anything.

pgte commented 13 years ago

Still weird, since I'm using macos too with the same version of node you are. Anyway, I'm closing this one since I can't replicate it. If you run into any clue regarding this, please let me know. Thanks for your feedback!