ulixee / hero

The web browser built for scraping
MIT License
800 stars 41 forks source link

Way to disable or configure the HumanEmulator plugin #138

Open AlanMcKen opened 2 years ago

AlanMcKen commented 2 years ago

Hi, is there any way to disable or configure the HumanEmulator plugin? When trying to fill a large text field (hundreds or thousands of words) with .$type(), the speed is very slow, and Hero crashes by timeout.

Thanks.

blakebyrnes commented 2 years ago

You can configure it a couple ways:

  1. Remove the human emulator from the default plugins that each Hero instance should be initiated with:

    import DefaultBrowserEmulator from '@unblocked-web/default-browser-emulator';
    
    Core.defaultUnblockedPlugins = [DefaultBrowserEmulator];

    NOTE: if you have a separate process running Core, this should happen wherever your Server lives.

  2. You can create a fork of the default human emulator and change behavior (registered as per above). That last human emulator "implementor" will control interactions:

    import CustomHumanEmulator from './emulator';
    
    Core.defaultUnblockedPlugins.push(CustomHumanEmulator);
  3. There are some undocumented static variables you can control on the human emulator class. (https://github.com/unblocked-web/unblocked/blob/4256ceceb3b863ecddf0d551a31d8e0cb581df7c/plugins/default-human-emulator/index.ts#L42)

AlanMcKen commented 2 years ago

Thanks! It worked, but after removing DefaultHumanEmulator from plugins, the fill rate is unexpectedly low. It takes ~45 seconds per 100 words. It's weird.

blakebyrnes commented 2 years ago

@AlanMcKen Sorry I didn't follow up on this. It sounds like that is likely being caused by something else. Did you figure out anything on the slowness with the emulator disabled?

blakebyrnes commented 1 year ago

Do you have a snippet you could share showing your setup and slow speed?

On Jul 18, 2022, at 6:22 PM, AlanMcKen @.***> wrote:

 Thanks! It worked, but after removing DefaultHumanEmulator from plugins, the fill rate is unexpectedly low. It takes ~45 seconds per 100 words. It's weird.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

blakebyrnes commented 1 year ago

Feature

Our implementation should be to enable plugin configurations so we can send configs to the default human emulator. We probably need a custom configuration parameter that gets sent into the constructor of each plugin (routed by the "id" of the plugin). Or we might choose to send all configs to each plugin.

The human emulator should be converted to adopt settings "per" instance instead of static configurations.

Alternative Approach

We could also consider allowing a disableHumanEmulator config on Hero that would just turn it off for an instance.

GlenDC commented 1 year ago

In Rust the builder pattern is often used which I find quite elegant for things like this. Where the default builder could set all your default plugins, but it would allow to use none of the default plugins if desired. Is that an approach which is agnostic to the NodeJS world @blakebyrnes ?

blakebyrnes commented 1 year ago

@GlenDC You can already define your default plugins at a Core level in a way you're describing. This is more a feature to allow a user to define them at a client level (or varied per run). There are a number of permissions issues for this one because we are always building Hero/Ulixee with the ultimate goal that Ulixee will have a cloud version for people to upload onto. Open-ended plugins in a cloud is risky business.