jonnnnyw / php-phantomjs

Execute PhantomJS commands through PHP
MIT License
1.44k stars 432 forks source link

page automation #174

Open Sarfroz opened 7 years ago

Sarfroz commented 7 years ago

hi, can you guide how exactly i can do this using php and phantom js.

http://phantomjs.org/page-automation.html

yipwt79 commented 7 years ago

hi Sarfroz,

You can do this via custom scripts. I managed to pulled it off, but ensure you have the [% autoescape false %] [% endautoescape %]

so you can get the URL passed from the php script.

The documentation is here: http://jonnnnyw.github.io/php-phantomjs/4.0/4-custom-scripts/

Example code below:

[% autoescape false %]

var page = require('webpage').create(); var fs = require('fs'); var url = '{{ input.getUrl() }}';

page.open(url, 'GET', '', function (status){

var content = page.content;

var path = '/home/steven/Code/phantomjs/logs/log_script11.txt';
fs.write(path, url, 'w');
fs.write(path, content, 'w+');
phantom.exit(1);

});

phantom.onError = function(msg, trace) { phantom.exit(1); };

[% endautoescape %]

Sarfroz commented 7 years ago

I tried sir but not working. I am using Partial script injection but no luck. this is my working phantom js code

var page = require('webpage').create();

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36'; page.onInitialized = function() { page.evaluate(function() { delete window._phantom; delete window.callPhantom; }); }; page.open('https://xxxxxxx', function(status) { if (status !== 'success') { console.log('Unable to access network'); } else { var ua = page.evaluate(function() { return document.getElementById('iddoc').textContent; }); console.log(ua); } phantom.exit(); });

if I run it via phantomjs command directly it works ok, but the problem is that I have to write everytime js code to change the url value. I hope you can give some example of this method.

On Thu, May 4, 2017 at 4:02 PM, yipwt79 notifications@github.com wrote:

hi Sarfroz,

You can do this via custom scripts. I managed to pulled it off, but ensure you have the [% autoescape false %] [% endautoescape %]

so you can get the URL passed from the php script.

The documentation is here: http://jonnnnyw.github.io/php-phantomjs/4.0/4-custom-scripts/

Example code below:

[% autoescape false %]

var page = require('webpage').create(); var fs = require('fs'); var url = '{{ input.getUrl() }}';

page.open(url, 'GET', '', function (status){

var content = page.content;

var path = '/home/steven/Code/phantomjs/logs/log_script11.txt'; fs.write(path, url, 'w'); fs.write(path, content, 'w+'); phantom.exit(1);

});

phantom.onError = function(msg, trace) { phantom.exit(1); };

[% endautoescape %]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jonnnnyw/php-phantomjs/issues/174#issuecomment-299149500, or mute the thread https://github.com/notifications/unsubscribe-auth/AL3kJqIs3XBim8e6Py0I5vqk6eQXNcykks5r2alNgaJpZM4M980Z .

yipwt79 commented 7 years ago

hi Sarfroz,

Ok, I've tried out the script and it works, but I'll let you know what needs to be done:

  1. Do NOT enable debugging, because there's known bugs when this is enabled, the script will take a long time, with no response. You can refer to this issue here: https://github.com/jonnnnyw/php-phantomjs/issues/74

I think debugging is better done via the terminal, eg

phantomjs --debug=true myscript.proc

Therefore you can catch any problems here first.

  1. I haven't tried partial scripts, only CUSTOM scripts, and I believe this is what you plan to do. Partial scripts is sort of over riding the partial scripts in the codes, so I think you really need to understand what JonnyW did. I didn't spend much time on this.

  2. Make sure that your scripts have the right permission:

    chmod 755 testing1.proc I am running Apache2 on Linux Ubuntu, so I also set: chown :www-data testing1.proc

  3. You'll need to be creative when returning data back to the caller PHP script. Define, and use a response.content object in the testing1.proc

    var response = {content:null}; //declaring an object response response.content = 'my content here'; //assign the results you want to pass back console.log(JSON.stringify(response)); //output it in JSON format.

You will be able to get the results in PHP script via: $response->getContent();

Note that if you don't pass a valid JSON string, the app doesn't give you the content that you want.

  1. You can create a centralize phantomjs config file:

    `{ / Same as: --ignore-ssl-errors=true / "ignoreSslErrors": true,

    / Same as: --max-disk-cache-size=1000 / "maxDiskCacheSize": 1000,

    / Same as: --output-encoding=utf8 / "outputEncoding": "utf8",

    "cookiesFile" : "/home/steven/Code/phantomjs/cookies/cookies.txt" }`

ok said that, here's my PHP caller full script:

`<?php

//timer $start = microtime(true);

use JonnyW\PhantomJs\Client; use JonnyW\PhantomJs\DependencyInjection\ServiceContainer; use JonnyW\PhantomJs\Message\Request;

require_once 'vendor/autoload.php'; require_once 'config.php';

error_reporting(E_ALL);

$client = Client::getInstance(); //var_dump($client->getCommand());

$location = '/home/steven/Code/phantomjs/procedures/';

$serviceContainer = ServiceContainer::getInstance(); $procedureLoader = $serviceContainer->get('procedure_loader_factory')->createProcedureLoader($location);

$url = 'https://www.reddit.com/'; / the script testing1.proc is located under $location / $fileName = 'testing1';

$client = Client::getInstance(); //$client->getEngine()->debug(true); //Hangs when enabled!!! $client->getEngine()->addOption('--config=/home/steven/Code/phantomjs/phantomjs-config.json'); $client->getEngine()->addOption("--web-security=no"); $client->getEngine()->addOption('--ssl-protocol=tlsv1');

//$client->getProcedureCompiler()->clearCache(); //$client->getProcedureCompiler()->disableCache(); //enableCache(), clearCache();

$client->setProcedure($fileName);
$client->getProcedureLoader()->addLoader($procedureLoader); $request = $client->getMessageFactory()->createRequest(); //for custom scripts. $response = $client->getMessageFactory()->createResponse();

$request->setMethod('GET'); $request->setUrl($url);

try{

$client->send($request, $response);

//echo "\n==== log ==== \n" .$client->getLog() . "\n";

//print_r($response->getConsole()); // Array

echo print_R($response->getHeaders()) ;

echo "status = " . $response->getStatus() . "\n";

echo "content = " . $response->getContent() . "\n" ;

} catch(Exception $e){

echo "Error catch\n";

echo $e->getMessage();

var_dump($client->getLog());
//print_r($e->getErrors());

}

/ timer end / $stop = round(microtime(true) - $start, 5);

echo "time: {$stop}\n";

?>`

Here is the testing1.proc

`[% autoescape false %]

var page = require('webpage').create(); var url = '{{ input.getUrl() }}';

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';

page.onInitialized = function() { page.evaluate(function() { delete window._phantom; delete window.callPhantom; }); };

page.open(url, function(status) {

if (status !== 'success') {

    console.log('Unable to access network');

} else {

    var ua = page.evaluate(function() {

        return document.getElementById('siteTable').innerHTML;
    });

    //console.log(ua);

    var response = {content:null};
    response.content = ua
    console.log(JSON.stringify(response));

}

phantom.exit();

});

[% endautoescape %]`

Ok, hope this helps.

Cheers

Sarfroz commented 7 years ago

Works like as a charm. Thanks a lot for this kind of support :) Only I disabled these lines and still, it was working good: $client->getEngine()->addOption('--config=/home/steven/Code/phantomjs/ phantomjs-config.json'); $client->getEngine()->addOption("--web-security=no"); $client->getEngine()->addOption('--ssl-protocol=tlsv1');

On Sun, May 7, 2017 at 11:54 AM, yipwt79 notifications@github.com wrote:

hi Sarfroz,

Ok, I've tried out the script and it works, but I'll let you know what needs to be done:

  1. Do NOT enable debugging, because there's known bugs when this is enabled, the script will take a long time, with no response. You can refer to this issue here:

    74 https://github.com/jonnnnyw/php-phantomjs/issues/74

I think debugging is better done via the terminal, eg

phantomjs --debug=true myscript.proc

Therefore you can catch any problems here first.

1.

I haven't tried partial scripts, only CUSTOM scripts, and I believe this is what you plan to do. Partial scripts is sort of over riding the partial scripts in the codes, so I think you really need to understand what JonnyW did. I didn't spend much time on this. 2.

Make sure that your scripts have the right permission:

chmod 755 testing1.proc I am running Apache2 on Linux Ubuntu, so I also set: chown :www-data testing1.proc

  1. You'll need to be creative when returning data back to the caller PHP script. Define, and use a response.content object in the testing1.proc

var response = {content:null}; //declaring an object response response.content = 'my content here'; //assign the results you want to pass back console.log(JSON.stringify(response)); //output it in JSON format.

You will be able to get the results in PHP script via: $response->getContent();

Note that if you don't pass a valid JSON string, the app doesn't give you the content that you want.

  1. You can create a centralize phantomjs config file:

    `{ / Same as: --ignore-ssl-errors=true / "ignoreSslErrors": true,

/ Same as: --max-disk-cache-size=1000 / "maxDiskCacheSize": 1000,

/ Same as: --output-encoding=utf8 / "outputEncoding": "utf8",

"cookiesFile" : "/home/steven/Code/phantomjs/cookies/cookies.txt" }` ok said that, here's my PHP caller full script:

`<?php

//timer $start = microtime(true);

use JonnyW\PhantomJs\Client; use JonnyW\PhantomJs\DependencyInjection\ServiceContainer; use JonnyW\PhantomJs\Message\Request;

require_once 'vendor/autoload.php'; require_once 'config.php';

error_reporting(E_ALL);

$client = Client::getInstance(); //var_dump($client->getCommand());

$location = '/home/steven/Code/phantomjs/procedures/';

$serviceContainer = ServiceContainer::getInstance(); $procedureLoader = $serviceContainer->get('procedure_loader_factory')-> createProcedureLoader($location);

$url = 'https://www.reddit.com/'; / the script testing1.proc is located under $location / $fileName = 'testing1';

$client = Client::getInstance(); //$client->getEngine()->debug(true); //Hangs when enabled!!! $client->getEngine()->addOption('--config=/home/steven/Code/phantomjs/ phantomjs-config.json'); $client->getEngine()->addOption("--web-security=no"); $client->getEngine()->addOption('--ssl-protocol=tlsv1');

//$client->getProcedureCompiler()->clearCache(); //$client->getProcedureCompiler()->disableCache(); //enableCache(), clearCache();

$client->setProcedure($fileName); $client->getProcedureLoader()->addLoader($procedureLoader); $request = $client->getMessageFactory()->createRequest(); //for custom scripts. $response = $client->getMessageFactory()->createResponse();

$request->setMethod('GET'); $request->setUrl($url);

try{

$client->send($request, $response);

//echo "\n==== log ==== \n" .$client->getLog() . "\n";

//print_r($response->getConsole()); // Array

echo print_R($response->getHeaders()) ;

echo "status = " . $response->getStatus() . "\n";

echo "content = " . $response->getContent() . "\n" ;

} catch(Exception $e){

echo "Error catch\n";

echo $e->getMessage();

var_dump($client->getLog()); //print_r($e->getErrors());

}

/ timer end / $stop = round(microtime(true) - $start, 5);

echo "time: {$stop}\n";

?> ` Here is the testing1.proc

`[% autoescape false %]

var page = require('webpage').create(); var url = '{{ input.getUrl() }}';

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';

page.onInitialized = function() { page.evaluate(function() { delete window._phantom; delete window.callPhantom; }); };

page.open(url, function(status) {

if (status !== 'success') {

console.log('Unable to access network');

} else {

var ua = page.evaluate(function() {

      return document.getElementById('siteTable').innerHTML;

});

//console.log(ua);

var response = {content:null}; response.content = ua console.log(JSON.stringify(response));

}

phantom.exit();

}); [% endautoescape %]`

Ok, hope this helps.

Cheers

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jonnnnyw/php-phantomjs/issues/174#issuecomment-299685248, or mute the thread https://github.com/notifications/unsubscribe-auth/AL3kJngTFY0T7yZaZ-BTzdiGAukuDTEFks5r3WOMgaJpZM4M980Z .

amhoho commented 7 years ago

@yipwt79 run your php and testing1.proc,result:

Array ( ) 1status = 0 content = string(0) "" time: 2.886 
gpgr888 commented 3 years ago

I tried php-phantom js and I have not enabled debug but still it freezes at some sites , any help ? I dont have custom scripts just default php-phantomjs