rialto-php / rialto

Manage Node resources with PHP
MIT License
170 stars 80 forks source link

Add lazy/eager instructions #13

Open nesk opened 6 years ago

nesk commented 6 years ago

The problem

Currently, it's up to the implementation to determine if the instruction should be awaited or not. For example, PuPHPeteer always await the instructions.

However, awaiting all the instructions is a problem. The waitForNavigation() method returns a promise that will be resolved only once the goto() method is called. Since PuPHPeteer is already awaiting for the first method, the second method cannot be called and Node will throw a Navigation Timeout Exceeded error. See nesk/puphpeteer#4 for more information about this specific bug.

How this can be solved

Instead of letting the implementation choose to use await or not for all of its instructions, Rialto could let the implementation choose a default resolving strategy (await or not) and the user will be able to override this behaviour for some instructions.

For example, Rialto could provide a useAwaitByDefault() method to let the implementation define the preferred resolving strategy:

async handleInstruction(instruction, responseHandler, errorHandler)
{
    instruction.useAwaitByDefault(true);

    // ...
}

Then the user could simply execute an instruction:

$resource->someMethodReturningAPromise(); // This will return the value of the resolved Promise

Or he could override the resolve strategy:

$resource->lazy->someMethodReturningAPromise(); // This will return the Promise

Of course, the implementation could also choose to not await by default (instruction.useAwaitByDefault(false)), but the user could override this too:

$resource->someMethodReturningAPromise(); // This will return the Promise
$resource->eager->someMethodReturningAPromise(); // This will return the value of the resolved Promise

Promises

Since it would be possible for an instruction to return a Promise, then we should provide some tools to use them.

A promise in PHP should be a BasicResource with a then() method which accepts a PHP callback with the resolved value as the first argument.

A PHP equivalent to Promise.all() should be provided, this would allow to wait for multiple promises. Typically, it would enable parallel calls (see nesk/rialto#9):

$browser = (new Puppeteer)->launch();

$page1 = $browser->newPage();
$page2 = $browser->newPage();

$request1 = $page1->lazy->goto('https://github.com/nesk/');
$request2 = $page2->lazy->goto('https://github.com/-not--a--real--profile-/');

Promise::all([$request1, $request2]).then(function ($responses) {
    echo $responses[0]->status(); // 200
    echo $responses[1]->status(); // 404
});
billisonline commented 4 years ago

@nesk how feasible do you think it would be for someone to take this on as a first issue? I'm doing heavy PuPHPeteer scraping for a client and would like to have guarantees about when a page is loaded etc. I have strong PHP skills/experience but I'm fair-to-middling when it comes to JS and Node. What do you think?

artemmolotov commented 4 years ago

At the moment, I have some results with asynchronous invocation of operations. I used a slightly different approach. Next week I will try to show what I did.

artemmolotov commented 4 years ago
$request1 = $page1->lazy->goto('https://github.com/nesk/');
$request2 = $page2->lazy->goto('https://github.com/-not--a--real--profile-/');

Adding a property with the name lazy or eager in each PHP-resource will not lead to conflicts if the JS-resource has a property with the same name?