ulixee / hero

The web browser built for scraping
MIT License
647 stars 32 forks source link

How to expose plugin when using connectionToCore? #239

Closed Baker68 closed 9 months ago

Baker68 commented 9 months ago

I have setup a CoreServer based on the examples found on the main site like this :

// Server.ts
import '@ulixee/commons/lib/SourceMapSupport';
import ShutdownHandler from '@ulixee/commons/lib/ShutdownHandler';
import WebSocket = require('ws');
import Core from '@ulixee/hero-core';
import {WsTransportToClient} from '@ulixee/net';
import ExecuteJsPlugin from '@ulixee/execute-js-plugin';
import {version} from '@ulixee/hero-core/package.json';
import UlixeeHostsConfig from '@ulixee/commons/config/hosts';
import {AddressInfo} from 'net';
import {IncomingMessage} from 'http';
import config from "./config";

class CoreServer {
    public addressPromise!: Promise<string>;
    private wsServer!: WebSocket.Server;
    private core!: Core;
    private port: number;

    constructor(port: number) {
        this.port = port;
        ShutdownHandler.register(() => this.close());
    }

    public async open(): Promise<void> {
        this.addressPromise = new Promise<string>(resolve => {
            this.wsServer = new WebSocket.Server({port: this.port}, () => {
                const address = this.wsServer.address() as AddressInfo;
                resolve(`localhost:${address.port}`);
            });
        });
        this.wsServer.on('connection', this.handleWsConnection.bind(this));
        this.core = await Core.start({maxConcurrentClientCount: config.CHROME_COUNT});
        this.core.use(require.resolve('./plugins/echoPlugin/echoPlugin'))
        const address = await this.addressPromise;
        await UlixeeHostsConfig.global.setVersionHost(version, address);
        // @ts-ignore
        ShutdownHandler.register(() => UlixeeHostsConfig.global.setVersionHost(version, null));

        console.log('Started server at %s', address);
    }

    public async close(): Promise<void> {
        try {
            this.wsServer.close();
        } catch (error) {
            console.log('Error closing socket connections', error);
        }
        await Core.shutdown();
    }

    private handleWsConnection(ws: WebSocket, req: IncomingMessage): void {
        // @ts-ignore
        const address = ws!._socket.remoteAddress;
        console.log('New connection : ', address)
        ws.on('close', () => console.log(`Closed : ${address}`))
        const transport = new WsTransportToClient(ws, req);
        const connection = this.core.addConnection(transport);
        ShutdownHandler.register(() => connection.disconnect());
    }
}

(async () => {
    const coreServer = new CoreServer(config.CORE_PORT);
    await coreServer.open();
})().catch(console.error);

And the Client / consumer :

  // Client.ts
  const hero = new Hero({
    showChrome: config.DEVELOPER_MODE,
    showChromeInteractions: true,
    sessionPersistence: false,
    showChromeAlive: false,
    viewport: {
      width: 2560,
      height: 1080,
      screenWidth: 2560,
      screenHeight: 1013,
    },
    connectionToCore: { host: core_url }
  });
  hero.use(require.resolve('./plugins/echoPlugin/echoPlugin'));
  await hero.goto('https://www.example.com');
  await hero.waitForPaintingStable();
 console.log(await hero.echo('Echo', 1, 2, 3, true))

For this example I I have used the echo plugin from the example directory.

and when I execute the Client.ts , it connects to core, goes to example.com and it fails with ITypeError: hero.echo is not a function.

How can we expose plugins to the client ?

Baker68 commented 9 months ago

Ok , so I found out that the activeTab or any other tab have the echo function defined (and is working). But why it is not present on the hero const ?

blakebyrnes commented 9 months ago

It's just not part of the plugin definition. You could create a version of it that does so pretty easily.

Baker68 commented 9 months ago

@blakebyrnes ; Isn't this the part that should expose the function to the main Hero instance ?

  public onHero(hero: Hero, sendToCore: ISendToCoreFn): void {
    hero.echo = (echo1: string, echo2: number, ...echoAny: any[]) => {
      return this.echo(sendToCore, echo1, echo2, ...echoAny);
    };
  }
blakebyrnes commented 9 months ago

Oh, sorry. For some reason I thought this was execute-js. This seems like a bug. Does the onHero initializer ever get called if you put a log in there?

Baker68 commented 9 months ago

Oh, sorry. For some reason I thought this was execute-js. This seems like a bug. Does the onHero initializer ever get called if you put a log in there?

Yes, it gets called.

blakebyrnes commented 9 months ago

What language are you using? I just tested your example in typescript and it is working correctly with no error.

Baker68 commented 9 months ago

What language are you using? I just tested your example in typescript and it is working correctly with no error.

I also use typescript.

The core plugin part is loaded by CoreServer and the client plugin part is loaded by the Client. Here is another example of a plugin that disables Javascript :

// this is loaded by the CoreServer and does not have the ClientPlugin definition
import {CorePlugin} from '@ulixee/hero-plugin-utils';
import {IOnClientCommandMeta} from '@ulixee/hero-interfaces/ICorePlugin';

export class CoreDisableJavascriptPlugin extends CorePlugin {
    static override readonly id = 'disable-javascript-plugin';
    static  override readonly type = "CorePlugin"

    public async onClientCommand(
        {page, frame}: IOnClientCommandMeta,
        value: boolean
    ): Promise<boolean> {
        await page.setJavaScriptEnabled(!value);
        return Promise.resolve(true);
    }
}

type DisableJavascriptPluginAdditions = {
    disableJavascript(value: boolean): Promise<boolean>;
};

// @ts-ignore
declare module '@ulixee/hero/lib/extendables' {
    interface Hero extends DisableJavascriptPluginAdditions {
    } // eslint-disable-line @typescript-eslint/no-shadow
    interface Tab extends DisableJavascriptPluginAdditions {
    } // eslint-disable-line @typescript-eslint/no-shadow
}

export default {
    CorePlugin: CoreDisableJavascriptPlugin,
};
// this is loaded by the Client and does not have the CorePlugin definition
import type Hero from '@ulixee/hero';
import type Tab from '@ulixee/hero/lib/Tab';
import { ClientPlugin } from '@ulixee/hero-plugin-utils';
import { ISendToCoreFn } from '@ulixee/hero-interfaces/IClientPlugin';

export class ClientDisableJavascriptPlugin extends ClientPlugin {
  static override readonly id = 'disable-javascript-plugin';
  static coreDependencyIds = [ClientDisableJavascriptPlugin.id];

  public onHero(hero: Hero, sendToCore: ISendToCoreFn): void {
   console.log('onHero');
    hero.disableJavascript = (value: boolean) => {
      return this.disableJavascript(sendToCore, value);
    };
  }

  public onTab(hero: Hero, tab: Tab, sendToCore: ISendToCoreFn): void {
   console.log('onTab');
    tab.disableJavascript = (value: boolean) => {
      return this.disableJavascript(sendToCore, value);
    };
  }

  private async disableJavascript(
    sendToCore: ISendToCoreFn,
    value: boolean
  ): Promise<any> {
    return await sendToCore(ClientDisableJavascriptPlugin.id, value);
  }
}

type DisableJavascriptPluginAdditions = {
  disableJavascript(value: boolean): Promise<boolean>;
};

declare module '@ulixee/hero/lib/extendables' {
  interface Hero extends DisableJavascriptPluginAdditions {}
  interface Tab extends DisableJavascriptPluginAdditions {}
}

export default ClientDisableJavascriptPlugin;

this is how I get a Hero instance :

import Hero from '@ulixee/hero';
import ExecuteJsPlugin from '@ulixee/execute-js-plugin';
import config from '../../config';
import plugins from './plugins';
import { BlockedResourceType } from '@ulixee/hero-interfaces/ITabOptions';
import type IHeroCreateOptions from '@ulixee/hero/interfaces/IHeroCreateOptions';
// I need to import them or typescript won't recognize the methods declared in the plugins
import type DisableJavascriptPlugin from '../plugins/disableJavascript/disableJavascriptPlugin';
import type PreventNavigationPlugin from '../plugins/preventNavigation/preventNavigationPlugin';
import type waitForNavigationPlugin from '../plugins/waitForNavigation/waitForNavigationPlugin';

const getHero = (core_url: string, extraOptions?: IHeroCreateOptions): Hero => {
  const ctor = {
    showChrome: config.DEVELOPER_MODE,
    showChromeInteractions: true,
    sessionPersistence: false,
    showChromeAlive: false,
    viewport: {
      width: 2560,
      height: 1080,
      screenWidth: 2560,
      screenHeight: 1013,
    },
    connectionToCore: { host: core_url },
    ...(!config.DEVELOPER_MODE
      ? {
          blockedResourceTypes: [
            BlockedResourceType.BlockCssResources,
            BlockedResourceType.BlockImages,
            BlockedResourceType.BlockIcons,
            BlockedResourceType.BlockMedia,
            BlockedResourceType.BlockFonts,
          ],
        }
      : {}),
    ...(extraOptions ? extraOptions : {}),
  };

  const hero = new Hero(ctor);
  hero.use(ExecuteJsPlugin);
 // plugins is an array (of strings) of absolute paths
  plugins.forEach((plugin) => {
    hero.use(plugin);
  });

  console.log(hero.disableJavascript);
  console.log(hero.activeTab.disableJavascript);

  return hero;
};

export default getHero;

execution

undefined -> console.log(hero.disableJavascript);
onTab
[Function (anonymous)]
onHero

It seems that the onTab event triggers first followed by onHero (tested 20 times to make sure before I post)

If you look at this screenshot, you will see that interface Hero extends DisableJavascriptPluginAdditions {} looks like is not accounted for. image

blakebyrnes commented 9 months ago

I was able to modify your server to load this disabled javascript script and it will not initialize onHero in your example, but if you connect to your server, it does go ahead and connect and properly logs onHero after it is connected:

  await hero.meta;

  console.log("Hero disablejs", hero.disableJavascript);
  console.log("tab disablejs", hero.activeTab.disableJavascript);
onHero
Hero disablejs [Function (anonymous)]
tab disablejs [Function (anonymous)]

One thing I did find... if I didn't modify your test server setup above to discover your plugin paths, it was silently failing. I think we need to find a way to improve logs in that case.

In my case

    this.core.use(require.resolve('./plugins-EchoClasses'));
    this.core.use(ExecuteJsPlugin);
    this.core.use(require.resolve('./test-plugin1'));
blakebyrnes commented 9 months ago

Oh, I should say: the onHero will run only when the session is created

Baker68 commented 9 months ago

I have tested it , waiting for hero.meta; fixes the problem.

blakebyrnes commented 9 months ago

Is there anything to resolve with this issue?

Baker68 commented 9 months ago

Not really ; hero.meta is what's needed in order for all the things to work properly.

Thanks @blakebyrnes !

blakebyrnes commented 9 months ago

Really, any connection to core will active the plug in methods (goto, meta, etc, etc). It just doesn't activate them until it connects.