ulixee / secret-agent

The web scraper that's nearly impossible to block - now called @ulixee/hero
https://secretagent.dev
MIT License
674 stars 45 forks source link

Some errors while using detached/frozen tabs #396

Open A-Posthuman opened 2 years ago

A-Posthuman commented 2 years ago

Here are some things I ran into while testing the detach/frozen tabs.

First, using the example detach code, on Ubuntu 20.04, about 50% of runs end in this error:

node:internal/process/promises:246
          triggerUncaughtException(err, true /* fromPromise */);
          ^

DisconnectedFromCore: This Agent has been disconnected from Core (coreHost: ws:/
/localhost:35085)
    at RemoteConnectionToCore.onResponse (/home/ubuntu/node_modules/@secret-agen
t/client/connections/ConnectionToCore.js:269:33)
    at RemoteConnectionToCore.onMessage (/home/ubuntu/node_modules/@secret-agent
/client/connections/ConnectionToCore.js:127:18)
    at WebSocket.<anonymous> (/home/ubuntu/node_modules/@secret-agent/client/con
nections/RemoteConnectionToCore.js:61:26)
    at WebSocket.emit (node:events:390:28)
    at Receiver.receiverOnMessage (/home/ubuntu/node_modules/ws/lib/websocket.js
:1008:20)
    at Receiver.emit (node:events:390:28)
    at Receiver.dataMessage (/home/ubuntu/node_modules/ws/lib/receiver.js:517:14
)
    at /home/ubuntu/node_modules/ws/lib/receiver.js:468:23
    at /home/ubuntu/node_modules/ws/lib/permessage-deflate.js:308:9
    at /home/ubuntu/node_modules/ws/lib/permessage-deflate.js:391:7
------CONNECTION----------------------------------
  at new Resolvable (/home/ubuntu/node_modules/@secret-agent/commons/Resolvable.js:9:22)
    at Object.createPromise (/home/ubuntu/node_modules/@secret-agent/commons/utils.js:62:12)
    at RemoteConnectionToCore.createPendingResult (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:291:43)
    at RemoteConnectionToCore.internalSendRequestAndWait (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:199:47)
    at RemoteConnectionToCore.sendRequest (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:119:21)
    at async CoreCommandQueue.flush (/home/ubuntu/node_modules/@secret-agent/client/lib/CoreCommandQueue.js:56:9) {
  coreHost: 'ws://localhost:35085',
  code: 'DisconnectedFromCore'
}

Node.js v17.1.0

Next, while using my own more complex script, I occasionally get a crash like this, sorry for the JSON log format:

`{"level":"error","message":"Uncaught (in promise) TypeError: Cannot read property 'id' of undefined","metadata":{"name":"Error","stack":"TypeError: Cannot read property 'id' of undefined\n    at Function.execJsPaths (<anonymous>:368:72)\n------REMOTE CORE---------------------------------\n  at Function.reviver (/home/ubuntu/node_modules/@secret-agent/commons/TypeSerializer.js:200:32)\n    at JSON.parse (<anonymous>)\n    at Function.parse (/home/ubuntu/node_modules/@secret-agent/commons/TypeSerializer.js:23:21)\n    at WebSocket.<anonymous> (/home/ubuntu/node_modules/@secret-agent/client/connections/RemoteConnectionToCore.js:60:62)\n    at WebSocket.emit (node:events:390:28)\n    at Receiver.receiverOnMessage (/home/ubuntu/node_modules/ws/lib/websocket.js:1008:20)\n    at Receiver.emit (node:events:390:28)\n    at Receiver.dataMessage (/home/ubuntu/node_modules/ws/lib/receiver.js:517:14)\n    at Receiver.getData (/home/ubuntu/node_modules/ws/lib/receiver.js:435:17)\n    at Receiver.startLoop (/home/ubuntu/node_modules/ws/lib/receiver.js:143:22)\n------CONNECTION----------------------------------\n  at new Resolvable (/home/ubuntu/node_modules/@secret-agent/commons/Resolvable.js:9:22)\n    at Object.createPromise (/home/ubuntu/node_modules/@secret-agent/commons/utils.js:62:12)\n    at RemoteConnectionToCore.createPendingResult (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:291:43)\n    at RemoteConnectionToCore.internalSendRequestAndWait (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:199:47)\n    at RemoteConnectionToCore.sendRequest (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:119:21)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async Object.cb (/home/ubuntu/node_modules/@secret-agent/client/lib/CoreCommandQueue.js:81:30)\n    at async Queue.next (/home/ubuntu/node_modules/@secret-agent/commons/Queue.js:68:25)\n------CORE COMMANDS-------------------------------\n  at Queue.run (/home/ubuntu/node_modules/@secret-agent/commons/Queue.js:28:25)\n    at CoreCommandQueue.run (/home/ubuntu/node_modules/@secret-agent/client/lib/CoreCommandQueue.js:78:14)\n    at CoreSession.detachTab (/home/ubuntu/node_modules/@secret-agent/client/lib/CoreSession.js:80:69)\n    at /home/ubuntu/node_modules/@secret-agent/client/lib/Agent.js:195:73\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n\n--------------------------------------------------\n------default-session-2---------------------------\n------493bb300-519c-11ec-bc1a-6bed479ab9ea--------\n--------------------------------------------------","timestamp":"2021-11-30T05:13:59.918Z"}}

lastly, also seeing this error popup on around 50% of runs of my script, also appears to be a typo 'anonymuos' in there:

2021-11-30T05:23:08.355Z ERROR [/home/ubuntu/node_modules/@secret-agent/core/lib
/SessionState] Window.console {
  message: "ERROR: applying action Error: Failed to execute 'setAttributeNS' on
'Element': '' is an invalid namespace for attributes.\n" +
    '    at setNodeAttributes (<anonymuos>:256:22)\n' +
    '    at deserializeNode (<anonymuos>:334:13)\n' +
    '    at replayDomEvent (<anonymuos>:86:16)\n' +
    '    at applyDomChanges (<anonymuos>:54:13)\n' +
    '    at replayDomChanges (<anonymuos>:37:9)\n' +
    '    at replayEvents (<anonymuos>:356:12)\n' +
    '    at <anonymuos>:357:3 "div.a-section.a-spacing-base.aok-hidden.tp-side-s
heet-link-container({ nodeId: 4040, Symbol(saNodeId): 4040, align: , title: , la
ng:  })"  "{ action: 2, nodeId: 4042, nodeType: 1, textContent: undefined, tagNa
me: SPAN }"',
  context: { sessionId: '93574c50-519d-11ec-963d-cbf8cc1a8370' },
  sessionId: '93574c50-519d-11ec-963d-cbf8cc1a8370',
  sessionName: 'default-session-2'
}
2021-11-30T05:23:08.366Z ERROR [/home/ubuntu/node_modules/@secret-agent/core/lib
/SessionState] Window.console {
  message: "ERROR: applying action Error: Failed to execute 'setAttributeNS' on
'Element': '' is an invalid namespace for attributes.\n" +
    '    at setNodeAttributes (<anonymuos>:256:22)\n' +
    '    at deserializeNode (<anonymuos>:334:13)\n' +
    '    at replayDomEvent (<anonymuos>:86:16)\n' +
    '    at applyDomChanges (<anonymuos>:54:13)\n' +
    '    at replayDomChanges (<anonymuos>:37:9)\n' +
    '    at replayEvents (<anonymuos>:356:12)\n' +
    '    at <anonymuos>:357:3 "div.a-section.a-spacing-base.aok-hidden.tp-side-s
heet-link-container({ nodeId: 4144, Symbol(saNodeId): 4144, align: , title: , la
ng:  })"  "{ action: 2, nodeId: 4146, nodeType: 1, textContent: undefined, tagNa
me: SPAN }"',
  context: { sessionId: '93574c50-519d-11ec-963d-cbf8cc1a8370' },
  sessionId: '93574c50-519d-11ec-963d-cbf8cc1a8370',
  sessionName: 'default-session-2'
}
2021-11-30T05:23:08.381Z ERROR [/home/ubuntu/node_modules/@secret-agent/core/lib/SessionState] Window.console {
  message: "ERROR: applying action Error: Failed to execute 'setAttributeNS' on 'Element': '' is an invalid namespace for attributes.\n" +
    '    at setNodeAttributes (<anonymuos>:256:22)\n' +
    '    at deserializeNode (<anonymuos>:334:13)\n' +
    '    at replayDomEvent (<anonymuos>:86:16)\n' +
    '    at applyDomChanges (<anonymuos>:54:13)\n' +
    '    at replayDomChanges (<anonymuos>:37:9)\n' +
    '    at replayEvents (<anonymuos>:356:12)\n' +
    '    at <anonymuos>:357:3 "div#twister_feature_div.celwidget({ nodeId: 3947, Symbol(saNodeId): 3947, align: , title: , lang:  })"  "{ action: 2, nodeId: 4201, nodeType: 1, textContent: undefined, tagName: SPAN }"',
  context: { sessionId: '93574c50-519d-11ec-963d-cbf8cc1a8370' },
  sessionId: '93574c50-519d-11ec-963d-cbf8cc1a8370',
  sessionName: 'default-session-2'
}
blakebyrnes commented 2 years ago

@A-Posthuman thanks for logging these. Could you send me code snippets or the session databases for some of these errors? The sessionId is in the errors (493bb300-519c-11ec-bc1a-6bed479ab9ea, 93574c50-519d-11ec-963d-cbf8cc1a8370)

A-Posthuman commented 2 years ago

I didn't keep those dbs around, but here is a similar db from today for that first error. It's generated on the 2nd run of the example detach() code from your docs.

306f0740-5265-11ec-ab37-093ef311e9de.zip

I'll try to get a db of the other error, but will have to send that on discord.

A-Posthuman commented 2 years ago

Testing out SA 1.6.1 against the 3 types of errors, I am no longer seeing the 2nd type "Uncaught (in promise) TypeError: Cannot read property 'id' of undefined".

However, continuing to see issues of the other 2 types. First here is a sample error I'm still seeing from the SA docs example frozen/detached tab script on Ubuntu when I get to doing the 2nd run of the script:

node:internal/process/promises:246
          triggerUncaughtException(err, true /* fromPromise */);
          ^

DisconnectedFromCore: This Agent has been disconnected from Core (coreHost: ws:/
/localhost:38273)
    at RemoteConnectionToCore.onResponse (/home/ubuntu/node_modules/@secret-agen
t/client/connections/ConnectionToCore.js:277:33)
    at RemoteConnectionToCore.onMessage (/home/ubuntu/node_modules/@secret-agent
/client/connections/ConnectionToCore.js:133:18)
    at WebSocket.<anonymous> (/home/ubuntu/node_modules/@secret-agent/client/connections/RemoteConnectionToCore.js:61:26)
    at WebSocket.emit (node:events:390:28)
    at Receiver.receiverOnMessage (/home/ubuntu/node_modules/ws/lib/websocket.js:1008:20)
    at Receiver.emit (node:events:390:28)
    at Receiver.dataMessage (/home/ubuntu/node_modules/ws/lib/receiver.js:517:14)
    at /home/ubuntu/node_modules/ws/lib/receiver.js:468:23
    at /home/ubuntu/node_modules/ws/lib/permessage-deflate.js:308:9
    at /home/ubuntu/node_modules/ws/lib/permessage-deflate.js:391:7
------CONNECTION----------------------------------
  at new Resolvable (/home/ubuntu/node_modules/@secret-agent/commons/Resolvable.js:9:22)
    at Object.createPromise (/home/ubuntu/node_modules/@secret-agent/commons/utils.js:62:12)
    at RemoteConnectionToCore.createPendingResult (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:299:43)
    at RemoteConnectionToCore.internalSendRequestAndWait (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:207:47)
    at RemoteConnectionToCore.sendRequest (/home/ubuntu/node_modules/@secret-agent/client/connections/ConnectionToCore.js:125:21)
    at async CoreCommandQueue.flush (/home/ubuntu/node_modules/@secret-agent/client/lib/CoreCommandQueue.js:56:9) {
  coreHost: 'ws://localhost:38273',
  code: 'DisconnectedFromCore'
}

Node.js v17.1.0

And then in my own script running into this on most runs after the 1st run, but not every single time. Occasionally one will not product these errors. When these errors happen, the document.body.outerHTML is either basically empty or missing most of the stuff.

2021-12-15T05:28:31.182Z ERROR [/home/ubuntu/node_modules/@secret-agent/core/lib/SessionState] Window.console {
  message: 'ERROR: applying action Error: Unable to translate node! nodeType = undefined\n' +
    '    at deserializeNode (<anonymuos>:342:15)\n' +
    '    at replayDomEvent (<anonymuos>:86:16)\n' +
    '    at applyDomChanges (<anonymuos>:54:13)\n' +
    '    at replayDomChanges (<anonymuos>:37:9)\n' +
    '    at replayEvents (<anonymuos>:356:12)\n' +
    '    at <anonymuos>:357:3   "{ action: 5, nodeId: 4263, nodeType: undefined, textContent: undefined, tagName: undefined }"',
  context: { sessionId: 'cb416d00-5d67-11ec-930a-b3d544c0a35e' },
  sessionId: 'cb416d00-5d67-11ec-930a-b3d544c0a35e',
  sessionName: 'default-session-2'
}
2021-12-15T05:28:31.200Z ERROR [/home/ubuntu/node_modules/@secret-agent/core/lib/SessionState] Window.console {
  message: 'ERROR: applying action Error: Unable to translate node! nodeType = undefined\n' +
    '    at deserializeNode (<anonymuos>:342:15)\n' +
    '    at replayDomEvent (<anonymuos>:86:16)\n' +
    '    at applyDomChanges (<anonymuos>:54:13)\n' +
    '    at replayDomChanges (<anonymuos>:37:9)\n' +
    '    at replayEvents (<anonymuos>:356:12)\n' +
    '    at <anonymuos>:357:3   "{ action: 3, nodeId: 4358, nodeType: undefined, textContent: undefined, tagName: undefined }"',
  context: { sessionId: 'cb416d00-5d67-11ec-930a-b3d544c0a35e' },
  sessionId: 'cb416d00-5d67-11ec-930a-b3d544c0a35e',
  sessionName: 'default-session-2'
}
2021-12-15T05:28:31.204Z ERROR [/home/ubuntu/node_modules/@secret-agent/core/lib/SessionState] Window.console {
  message: 'ERROR: applying action Error: Unable to translate node! nodeType = undefined\n' +
    '    at deserializeNode (<anonymuos>:342:15)\n' +
    '    at replayDomEvent (<anonymuos>:86:16)\n' +
    '    at applyDomChanges (<anonymuos>:54:13)\n' +
    '    at replayDomChanges (<anonymuos>:37:9)\n' +
    '    at replayEvents (<anonymuos>:356:12)\n' +
    '    at <anonymuos>:357:3   "{ action: 5, nodeId: 4357, nodeType: undefined, textContent: undefined, tagName: undefined }"',
  context: { sessionId: 'cb416d00-5d67-11ec-930a-b3d544c0a35e' },
  sessionId: 'cb416d00-5d67-11ec-930a-b3d544c0a35e',
  sessionName: 'default-session-2'
}

The errors continue to repeat similarly hundreds of times per URL.

blakebyrnes commented 2 years ago

The remaining issue on this ticket is that FrozenTabs have trouble loading the proper dom changes when a page is reloaded or has a redirect right in it. NOTE: Hero code loading the domchanges has diverged from SecretAgent, so this fix should probably happen in Hero.