segment-boneyard / nightmare

A high-level browser automation library.
https://open.segment.com
19.54k stars 1.08k forks source link

Crashing with no useful information #405

Closed yocontra closed 8 years ago

yocontra commented 8 years ago

After opening one instance then in series loading 10,000 pages (sometimes it happens even lower, around 1000) this happens:

Nightmare runner error:

    Error: Uncaught, unspecified "error" event. ([object Object])
        at EventEmitter.emit (events.js:144:17)
        at EventEmitter.<anonymous> (/Users/contra/Projects/college-scrapers/node_modules/electron-prebuilt/dist/Electron.app/Contents/Resources/atom.asar/browser/api/lib/web-contents.js:85:27)
        at emitTwo (events.js:87:13)
        at EventEmitter.emit (events.js:172:7)

and the process crashes

matthewmueller commented 8 years ago

@contra can you try again, this time with DEBUG=nightmare*? That may help with debugging this issue. May also be a memory leak.

yocontra commented 8 years ago

Solved the problem by reusing a new nightmare instance for each request, but the overhead on this is making everything quite a bit slower. I think this demonstrates that it's a leak in the nightmare instance that was loading 10K pages and not a leak in any of the other application code.

rosshinkley commented 8 years ago

FWIW, I was able to recreate this behavior, but even after sifting through the DEBUG data and peeking through Electron output, nothing really stood out. This does smell of a memory issue, but htop data seems to disagree with that hypothesis. I'll take another swing at that as time permits.

I also noticed (mostly by accident) if there is a network interruption, Nightmare seems to fail silently and will also produce this behavior. Did you happen to have a networking issue?

rosshinkley commented 8 years ago

Unfortunately, creating a new Nightmare instance for every request seems to also have issues. See #561.

rosshinkley commented 8 years ago

@contra Are you still experiencing this problem?

yocontra commented 8 years ago

@rosshinkley I don't think it was ever fixed, I'm not using this for anything right now though.

rosshinkley commented 8 years ago

Closing for lack of info/age. If you come across this problem again, feel free to reopen/submit a new issue.

yocontra commented 8 years ago

@rosshinkley Did you push a fix for the problem? I don't think it disappeared on its own. You said earlier you were able to reproduce it, is it still happening for you?

rosshinkley commented 8 years ago

@contra I haven't been able to reliably reproduce without a networking issue. I'll see if I can dig out the test script and try it again against the current version.

willclarktech commented 8 years ago

Also getting this error. Don't seem to have any networking issues and I'd be surprised if it was a memory issue as I've got only a very basic script here. I can't see anything in the debug logs that's helpful, but just in case here's the end of it:

nightmare:log did-get-response-details +24ms [{"sender":{"id":1,"session":{"cookies":{},"protocol":{},"webRequest":{}},"hostWebContents":null,"devToolsWebContents":null,"debugger":{},"_maxListeners":0,"_events":{"will-navigate":[null,null],"destroyed":[null,null],"did-finish-load":[null],"did-fail-load":[null],"did-start-loading":[null,null],"did-stop-loading":[null,null],"did-get-response-details":[null],"dom-ready":[null]},"_eventsCount":27,"browserWindowOptions":{"waitTimeout":1000,"show":false,"alwaysOnTop":true,"webPreferences":{"preload":"/Users/Will/projects/learning/elm/elm-boilerplate/node_modules/nightmare/lib/preload.js","nodeIntegration":false}}}},false,"http://localhost:3000/browser-sync/socket.io/?EIO=3&transport=polling&t=LOtQm75&sid=L0syoNMMoE8eWY-FAAAl","http://localhost:3000/browser-sync/socket.io/?EIO=3&transport=polling&t=LOtQm75&sid=L0syoNMMoE8eWY-FAAAl",200,"GET","http://localhost:3000/",{"access-control-allow-origin":["*"],"connection":["keep-alive"],"content-length":["4"],"content-type":["application/octet-stream"],"date":["Fri, 29 Jul 2016 16:09:34 GMT"],"set-cookie":["io=L0syoNMMoE8eWY-FAAAl"]},"xhr"]
 into #new-todo +118msype()
 into #new-todo +1ms.type()
 into #new-todo +0ms.type()
 into #new-todo +0ms.type()
 into #new-todo +0ms.type()
 into #new-todo +1ms.type()
 into #new-todo +0ms.type()
 into #new-todo +0ms.type()
 into #new-todo +1ms.type()
 into #new-todo +0ms.type()
 into #new-todo +0ms.type()
 into #new-todo +0ms.type()
  nightmare:actions .click() on #todo-4 .toggle +59ms
  nightmare:actions .click() on #todo-8 .toggle +1ms
  nightmare:actions .click() on #todo-7 .toggle +0ms
  nightmare:actions .click() on #todo-6 .toggle +0ms
  nightmare:actions .click() on #todo-2 .toggle +1ms
    And I have completed 5 Todos
  nightmare queueing action "wait" +50ms
  nightmare queueing action "wait" +0ms
  nightmare queueing action "click" +0ms
  nightmare running +0ms
  nightmare:actions .wait() for 200ms +0ms
Nightmare runner error:

    Error: Uncaught, unspecified "error" event. ([object Object])
        at EventEmitter.emit (events.js:163:17)
        at EventEmitter.<anonymous> (/Users/Will/projects/learning/elm/elm-boilerplate/node_modules/electron-prebuilt/dist/Electron.app/Contents/Resources/electron.asar/browser/api/web-contents.js:156:13)
        at emitTwo (events.js:106:13)
        at EventEmitter.emit (events.js:191:7)

No idea why the logs of into #new-todo +0ms.type() are screwed up like that - maybe it's related?

rosshinkley commented 8 years ago

@contra To cut out networking issues, I propped up a server locally to issue requests to (responding with plaintext), then issued 100k sequential requests to that service with a single Nightmare instance. Nightmare did not hang. I'm thinking about propping up a minimal HTML payload, then adding javascript and/or HTML and running requests against that to see if it's a caching issue or a memory issue or... something. If you've got ideas, I'd love to hear them.

@LikeJasper I don't think I've ever seen debug behave like that... I wonder if it's interpreting a backspace or non-character escape? If you could put together a minimum case that reproduces that behavior, I'd love to see that too.

yocontra commented 8 years ago

@rosshinkley Well, it's probably more than 100k requests total from the instance itself. I opened 10k requests, and each page probably had ~100 resources on it (CSS, JS, Images, etc.). You should try sending 100k requests to a more suitable test that doesn't load instantly to rule out race conditions (https://mail.google.com/ ?)

rosshinkley commented 8 years ago

@contra I'm trying to keep the tests local because I have gotten some aberrant behavior with flaky networks. (The kind of stuff that is bordering on impossible to reproduce without infrastructure that I just haven't had time to play with yet.) That's why I was tinkering with the idea of making generated payloads that would simulate multiple assets of varying sizes. I'd also prefer to keep whatever I'm testing against local as then I'm guaranteed to know what assets are being loaded. Being able to make assertions about what is being loaded will help narrow what's actually wrong.

It may also be worth introducing an artificial networking bottleneck to simulate poor bandwidth.

At any rate, the biggest problem with these sorts of tests is time: it's going to take some tinkering to figure out a semi-reliable crash condition (if there is one).

yocontra commented 8 years ago

@rosshinkley Understandable, try delaying the response from your server by a couple of seconds: setTimeout(() => res.status(200).end(), 5000)

I was 99% sure it had something to do with a race condition while waiting for requests to finish.

If you have a repo w/ your test case I can try to modify it to reproduce.

rosshinkley commented 8 years ago

@contra I haven't had time to address this as yet, but I'll post here when I build a working test case. :)

willclarktech commented 8 years ago

@rosshinkley Don't think I've introduced any backspaces or non-character escapes myself, but I'll see if I can put together a minimal case to reproduce the weird logging.