nodejs / node-v0.x-archive

Moved to https://github.com/nodejs/node
34.42k stars 7.31k forks source link

setting `process.title` hangs on OSX after max number of child proceses was reached #8288

Closed bajtos closed 10 years ago

bajtos commented 10 years ago

Once Error: spawn EAGAIN is encountered, process.title setter hangs the node process.

Platform: Mac OS X 10.9.4 Node version: v0.10.24

Code to reproduce - run it on your Mac and be ready to hard-shutdown the machine afterwards, as the regular restart/shutdown won't work anymore:

var spawn = require('child_process').spawn;

var name = process.argv[2];
if (name) {
  process.title = name;
  setTimeout(process.exit, 5000);
  return;
}

console.log('Starting the fork bomb.');
iterate();

function iterate() {
  try {
    var child = spawn(
      process.execPath,
      [__filename, 'bomber' ],
      { stdio: 'inherit', detached: true });
    child.on('error', onError);
  } catch(err) {
    onError(err);
    return;
  }

  process.nextTick(iterate);
}

function onError(err) {
  console.error('--SPAWN-FAILED--');
  console.error(err);
  console.log('Going to set the process title');
  process.title = 'foo';
  console.log('Oops, that was supposed to never return');
}

killall bomber is your friend as long as your OS has resources to spawn another process and let it open file descriptors ;-)

@indutny @jbergstroem this is a follow up for #8279.

jbergstroem commented 10 years ago

Haven't really run this but a quick glance tells me you're spawning processes until you hit ulimit. Sounds like expected behaviour if you ask me.

indutny commented 10 years ago

Hm... @bajtos AFAIK OSX doesn't handle this well at all, are you sure that this is related to process.title? It seems to be killing my system even without it.

bajtos commented 10 years ago

@indutny AFAIK OSX doesn't handle this well at all, are you sure that this is related to process.title? It seems to be killing my system even without it.

It does not kill my system :)

Here is what happens when the number of processes is close to ulimit and the system survived:

I am intentionally vague about what does "close to ulimit" means. It may be ulimit-1, but also ulimit-10. I also have a feeling that once that state is reached, it is sort of permanent. You can kill the master node process, all bomber processes, and process.title may still hang. However, as I was experimenting with the script to reproduce, the behaviour differed, so I am not exactly sure about this.

Also when I did not call process.title from the master Node process that encountered spawn EAGAIN, or let the bomber processes quit before calling process.title, then the problem did not occur.

@jbergstroem Haven't really run this but a quick glance tells me you're spawning processes until you hit ulimit. Sounds like expected behaviour if you ask me.

Could you please elaborate more how is process.title setter connected with process ulimit? And why is it expected that the Node process hangs in this scenario, instead of throwing a decent error?


Anyways. I do realize this is an edge-of-an-edge case and it is very unlikely that other people will encounter it, or at least not often. My primary goal is to record what I have found, so that when somebody else happens to run into the same problem, he will find this issue and learn that it is a known problem. If you think it's not worth investigating and fixing it, then I am fine with that. I am sure there are more important issues to solve.

indutny commented 10 years ago

Ok, assuming that the process hangs on process.title - could you please connect to it with a lldb -p <pid> and paste here a backtrace output?

bajtos commented 10 years ago

Thanks @indutny for concrete steps how to help you further. I'll try to get you the requested information early next week.

bajtos commented 10 years ago

I have to run the script twice to reproduce the behaviour, the first run did not hang.

Here is the backtrace output:

(lldb) thread backtrace
* thread #1: tid = 0x27ff4, 0x00007fff914c5a56 libsystem_kernel.dylib`semaphore_wait_trap + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff914c5a56 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x00007fff8e4129f9 libdispatch.dylib`_dispatch_semaphore_wait_slow + 206
    frame #2: 0x00007fff92c287ef libxpc.dylib`xpc_connection_send_message_with_reply_sync + 195
    frame #3: 0x00007fff96b940f4 LaunchServices`_LSApplicationCheckIn + 1294
    frame #4: 0x0000000100136a23 node`uv__set_process_title + 283
    frame #5: 0x0000000100135906 node`uv_set_process_title + 44
    frame #6: 0x000000010000a32e node`node::ProcessTitleSetter(v8::Local<v8::String>, v8::Local<v8::Value>, v8::AccessorInfo const&) + 52
    frame #7: 0x0000000100240703 node`v8::internal::JSObject::SetPropertyWithCallback(v8::internal::Object*, v8::internal::String*, v8::internal::Object*, v8::internal::JSObject*, v8::internal::StrictModeFlag) + 671
    frame #8: 0x0000000100242e56 node`v8::internal::JSObject::SetPropertyForResult(v8::internal::LookupResult*, v8::internal::String*, v8::internal::Object*, PropertyAttributes, v8::internal::StrictModeFlag, v8::internal::JSReceiver::StoreFromKeyed) + 1024
    frame #9: 0x00000001002402b6 node`v8::internal::JSReceiver::SetProperty(v8::internal::String*, v8::internal::Object*, PropertyAttributes, v8::internal::StrictModeFlag, v8::internal::JSReceiver::StoreFromKeyed) + 248
    frame #10: 0x00000001001fa69f node`v8::internal::StoreIC::Store(v8::internal::InlineCacheState, v8::internal::StrictModeFlag, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::Object>) + 1009
    frame #11: 0x00000001001fbe09 node`v8::internal::StoreIC_Miss(v8::internal::Arguments, v8::internal::Isolate*) + 179
indutny commented 10 years ago

Just an information: the original problem was "Not responding" mark in a process manager, and after reading https://code.google.com/p/chromium/issues/detail?id=304860 I have decided to issue SetApplicationIsDaemon call. But it appears that LaunchServices itself is trying to spawn process...

@bajtos could you please help me a little bit more with this thing?

  1. ps ax | grep launchservicesd should give you a pid
  2. sudo dtruss -p <pid>
  3. Run your test case until it hangs
  4. Post dtruss output as a gist.

Thank you!

indutny commented 10 years ago

Hm... it appears to me that webkit is still using the SetApplicationIsDaemon trick: https://github.com/WebKit/webkit/blob/0190dd5b8c0272beaa705dbc10863a84a3e6af5e/Source/WebKit2/NetworkProcess/mac/NetworkProcessMac.mm#L58

indutny commented 10 years ago

@bajtos sudo dtruss -p <node-pid> should be very useful too.

bajtos commented 10 years ago

dtruss logs: https://gist.github.com/bajtos/ac14bbc3c5342181eed5

Running the fork bomb with dtruss enabled was way to slow. I run the bomb first, then started dtruss, and then run the test case again. I hope it does not affect the collected data.

indutny commented 10 years ago

Thanks man!

indutny commented 10 years ago

Here is an internal conversation between launchservices and node.js:

https://gist.github.com/indutny/9eec8b2c801118d1b2d1

indutny commented 10 years ago

I'm not sure, how much sense does this patch actually make, but could you please give it a try?

indutny commented 10 years ago

Oh, btw, I just realized that Activity Monitor is no longer showing process.title value on both v0.10 and v0.11 node.js ...

indutny commented 10 years ago

@bajtos wait... my patch doesn't seem to be properly sending messages...

indutny commented 10 years ago

@bajtos updated gist: https://gist.github.com/indutny/1ca6a3c616281df87f69

indutny commented 10 years ago

@bajtos and another update: https://gist.github.com/indutny/1ca6a3c616281df87f69, this time should be working just fine.

indutny commented 10 years ago

@bajtos on a second thought, this is a OS bug, not a libuv one. The LaunchServices daemon is just not responding to the node.js process, and hanging is the most appropriate thing that node.js could do.

indutny commented 10 years ago

However, if it still bites you pretty hard, we could probably make it perform this operation in a separate thread. Which is not really a solution, but a workaround. Does it sounds reasonable to you? (cc @saghul)

bajtos commented 10 years ago

@bajtos on a second thought, this is a OS bug, not a libuv one. The LaunchServices daemon is just not responding to the node.js process, and hanging is the most appropriate thing that node.js could do.

That's a fair conclusion. Is there a way how to report this bug to Apple? Do they care about such things at all?

However, if it still bites you pretty hard, we could probably make it perform this operation in a separate thread. Which is not really a solution, but a workaround. Does it sounds reasonable to you? (cc @saghul)

I don't understand how running the operation in a separate thread will improve the situation, since calling the process.title setter is still a blocking operation, right?

Honestly, I think this issue is not worth spending more effort on. I have fixed my gulpfile immediately as I realised it is doing a fork bomb, and I did not run into this problem ever since.

Having said that, I do appreciate the effort you invested into investigating this issue!

saghul commented 10 years ago

FWIW, I agree with both :-)

indutny commented 10 years ago

I'll fill a bugreport there: https://idmsa.apple.com/IDMSWebAuth/login.html?appIdKey=77e2a60d4bdfa6b7311c854a56505800be3c24e3a27a670098ff61b69fc5214b&sslEnabled=true&rv=3

indutny commented 10 years ago

Here is the reference ticket: 18198219