ulixee / hero

The web browser built for scraping
MIT License
647 stars 32 forks source link

hero.waitForNewTab() Not Working in Headless instance of Chrome #269

Open Cmeesh11 opened 3 weeks ago

Cmeesh11 commented 3 weeks ago

I'm trying to navigate to a new tab by clicking a link (that should open a new tab) and retrieve a resource response from that tab, but whenever using this hero.waitForNewTab() in a headless instance, it always times out, no matter how long the timeout is set for. It works perfectly fine when running locally but it seems like the tabs are handled differently on a headless instance. I'm on EC2 using Chrome. I'm able to log the tabs using hero.tabs, but directly referencing any of those tabs returns undefined.

Example:

  const commandId = await hero.lastCommandId;

  console.log( "clicking link" );
  await linkElement.$click();

  console.log( "waiting for new tab" );
  const tab = await hero.waitForNewTab();
  console.log( `New tab detected.` );

  console.log( "Waiting for resource" );
  const response = await tab.waitForResource(
    {
      "url" : url,
      "type" : "Document"
    },
    {
      "timeoutMs" : 60000,
      "sinceCommandId" : commandId
    }
  );

Output:

clicking link

waiting for new tab

New tab detected.

2024-06-04T21:08:07.617Z ERROR [unblocked-agent/lib/Page] Page.initializationError {
  context: {
    sessionId: 'Ze8pFHNHf7dpmhgtrk23p',
    browserContextId: 'EF3AB0A5403CA5C1879F30009271860C',
    targetId: '5F4F7B31590D9E386C4378CBA1820BE5'
  },
  sessionId: 'Ze8pFHNHf7dpmhgtrk23p',
  sessionName: undefined
} TimeoutError: DevtoolsApiMessage did not respond after 60 seconds. (Network.enable, id=1528)
  at new Resolvable (/home/ec2-user/.../source/node_modules/commons/lib/Resolvable.ts:19:18)
    at createPromise (/home/ec2-user/.../source/node_modules/commons/lib/utils.ts:140:10)
    at DevtoolsSession.send (/home/ec2-user/.../source/agent/main/lib/DevtoolsSession.ts:82:37)
    at NetworkManager.initialize (/home/ec2-user/.../source/agent/main/lib/NetworkManager.ts:114:10)
    at Page.initialize (/home/ec2-user/.../source/agent/main/lib/Page.ts:632:27)
    at new Page (/home/ec2-user/.../source/agent/main/lib/Page.ts:205:25)
    at BrowserContext.onPageAttached (/home/ec2-user/.../source/agent/main/lib/BrowserContext.ts:222:18)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at process.processImmediate (node:internal/timers:447:9)
    at process.callbackTrampoline (node:internal/async_hooks:130:17)
TimeoutError: DevtoolsApiMessage did not respond after 60 seconds. (Network.enable, id=1528)
  at new Resolvable (/home/ec2-user/.../source/node_modules/commons/lib/Resolvable.ts:19:18)
    at createPromise (/home/ec2-user/.../source/node_modules/commons/lib/utils.ts:140:10)
    at DevtoolsSession.send (/home/ec2-user/.../source/agent/main/lib/DevtoolsSession.ts:82:37)
    at NetworkManager.initialize (/home/ec2-user/.../source/agent/main/lib/NetworkManager.ts:114:10)
    at Page.initialize (/home/ec2-user/.../source/agent/main/lib/Page.ts:632:27)
    at new Page (/home/ec2-user/.../source/agent/main/lib/Page.ts:205:25)
    at BrowserContext.onPageAttached (/home/ec2-user/.../source/agent/main/lib/BrowserContext.ts:222:18)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at process.processImmediate (node:internal/timers:447:9)
    at process.callbackTrampoline (node:internal/async_hooks:130:17)
2024-06-04T21:08:23.077Z ERROR [hero-core/connections/ConnectionToHeroClient] ConnectionToClient.HandleRequestError {
  context: {},
  sessionId: 'Ze8pFHNHf7dpmhgtrk23p',
  sessionName: undefined
} TimeoutError: Timeout waiting for child-tab-created
  at new Resolvable (/home/ec2-user/.../source/node_modules/commons/lib/Resolvable.ts:19:18)
    at createPromise (/home/ec2-user/.../source/node_modules/commons/lib/utils.ts:140:10)
    at Tab.waitOn (/home/ec2-user/.../source/node_modules/commons/lib/TypedEventEmitter.ts:61:34)
    at Tab.waitForNewTab (/home/ec2-user/.../source/node_modules/core/lib/Tab.ts:650:38)
    at CommandRecorder.runCommandFn (/home/ec2-user/.../source/node_modules/core/lib/CommandRecorder.ts:90:32)
    at async CommandRunner.runFn (/home/ec2-user/.../source/node_modules/core/lib/CommandRunner.ts:36:14)
    at async ConnectionToHeroClient.executeCommand (/home/ec2-user/.../source/node_modules/core/connections/ConnectionToHeroClient.ts:258:12)
    at async ConnectionToHeroClient.handleRequest (/home/ec2-user/.../source/node_modules/core/connections/ConnectionToHeroClient.ts:66:14)
blakebyrnes commented 3 weeks ago

Hi @Cmeesh11, sorry this isn't working for you. We some very similar e2e tests in hero (https://github.com/ulixee/hero/blob/fa241bd77fc182e576a49a416482dd003db2541e/end-to-end/test/tab.test.ts). I can't tell if yours would be more like the skipped test that has a todo to figure out, or more like the resources one.

Do you know why the "New tab detected" log is happening but it fails waiting for child-tab-created? I would have thought it would fail before the log timeout.

Cmeesh11 commented 3 weeks ago

@blakebyrnes It seemed like despite the "New tab detected" being logged, the script was still waiting for the new tab. When attempting another command after the "New tab detected" log, it wouldn't execute, and would just hang until I got the child tab error.