ulixee / secret-agent

The web scraper that's nearly impossible to block - now called @ulixee/hero
https://secretagent.dev
MIT License
667 stars 44 forks source link

Media stream support problem #357

Open merloisac opened 2 years ago

merloisac commented 2 years ago

Re-opening it cause the same problem started to happen, the website I'm trying to scrape now is twitch.tv, the m3u8 is loading forever until it reaches timeout

import {Agent} from 'secret-agent';

(async () => {
    process.env.SA_SHOW_BROWSER = "true";
    const agent = new Agent();
    await agent.goto(`https://player.twitch.tv/?channel=gaules&enableExtensions=true&muted=false&parent=twitch.tv&player=popout&volume=0.5&mature=true&quality=160p30`);
    const m3u8 = await agent.waitForResource({url:/m3u8/}, {timeoutMs: 120000});
    console.log(m3u8[0].url);
    await agent.waitForMillis(100000);

})();

The same doesn't happen on normal chrome browser, it loads "instantly"

Originally posted by @merloisac in https://github.com/ulixee/secret-agent/issues/337#issuecomment-928165743

merloisac commented 2 years ago

337

blakebyrnes commented 2 years ago

@merloisac which command is timing out? If you have a session database, would be very appreciated (https://secretagent.dev/docs/advanced/session)

merloisac commented 2 years ago

The m3u8 request response is being timed out, I attached the SA session here sa_sess.zip

blakebyrnes commented 2 years ago

@merloisac Just a quick followup: this completes some of the time for me. The database you shared actually shows your m3u8 returning.

Are you seeing actual timeout errors?

Or are you wishing it would prompt immediately that the "m3u8" is available and then allow you to stream it? Right now, it is waiting to be fully downloaded before returning, which seems like a bad model for a large asset like this.