merlinthemagic / MTS

Automation Tools for PHP
GNU Lesser General Public License v3.0
111 stars 29 forks source link

How to get dom element? #33

Closed plonknimbuzz closed 7 years ago

plonknimbuzz commented 7 years ago

i try to learn this 4 hours but im stuck in this random website.

how to get the video from this. (note: this website provide direct download link, so maybe its fine to use this website as example)

https://indoxxi.net/film-seri/stranger-things-season-2-2017-1fhos2/play

because, there is no way to get specific dom element except whole html. So i decide to use simple html dom parser.

after page loaded, i run several script on web console

$('video').attr('src');
undefined
$("[id='ep-1072070']").click()
[a#ep-1072070.btn-eps.active, prevObject: init(1), context: document, selector: "[id='ep-1072070']"]
$('video').attr('src');
"https://lh3.googleusercontent.com/wpbYsMBx7WHFrm_jSjZQrQVlPLusRF0njilMpyTlbg_YVNSffjfufAWht5_kvJu4BUDJRpWkeuw=m22"

from information above. we know that there is not any

ok. lets do this with MTS

============== header.php

<?php
ini_set('max_execution_time', 120);
require_once "/var/www/html/test/MTS/MTS/EnableMTS.php";
require_once "advanced_html_dom.php"; //same as PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/manual.htm
$url = "https://indoxxi.net/film-seri/stranger-things-season-2-2017-1fhos2/play";

//MTS begin
$windowObj      = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow($url);
$windowObj->setUserAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0");
$wait   = "function wait5() {
    setTimeout(function(){}, 5000);
    return 'wait 5sec';
}";
$windowObj->loadJS($wait);
$selector = "[id='ep-1072070']";
$exists     = $windowObj->getSelectorExists($selector);

test1.php

<?php
require('header.php');
if($exists)
{
    //using clickElement
    $windowObj->clickElement($selector);
    echo $windowObj->callJSFunction("wait5") ."<br>";
    sleep(5);
    $dom = $windowObj->getDom();
    $html = str_get_html($dom); //from simple dom html parser
    var_dump($html->find('video')->outertext); //still empty

}

test2.php

<?php
require('header.php');
if($exists)
{
    //using leftclick
    $windowObj->mouseEventOnElement($selector, 'leftclick');
    echo $windowObj->callJSFunction("wait5") ."<br>";
    sleep(5);
    $dom = $windowObj->getDom();
    $html = str_get_html($dom); //from simple dom html parser
    var_dump($html->find('video')->outertext); //still empty

}

test3.php

<?php
require('header.php');
if($exists)
{
    //using callJSFunction
    $scriptData = "function myHelloWorld() {
        loadEpisode(0,1072070);
        return 'script loaded';
    }"; //we get loadEpisode() from onclick property in $selector
    $windowObj->loadJS($scriptData);
    echo $windowObj->callJSFunction("myHelloWorld") ."<br>";
    echo $windowObj->callJSFunction("wait5") ."<br>";
    sleep(5);
    $dom = $windowObj->getDom();
    $html = str_get_html($dom); //from simple dom html parser
    var_dump($html->find('video')->outertext); //still empty

}

but all i get is NULL. :(

how i can get that video src using MTS?

thanks .

sry for my long post. but i really struggle to know about this

merlinthemagic commented 7 years ago

look at #5. do a while loop to test for the avail of a particular selector, then use:

$selector   = "[id=video]";
$eleDetails = $windowObj->getElement($selector);

To get the details of that particular element.

5

plonknimbuzz commented 7 years ago

sorry. i just wake up. today i'm so tired to try both MTS and phantomJS. i sleep at 6 A.M

here my code from issue #5

<?php
ini_set('max_execution_time', 120);
set_time_limit(120);
require_once "/var/www/html/test/MTS/MTS/EnableMTS.php";
require_once "advanced_html_dom.php";
$url = "https://indoxxi.net/film-seri/stranger-things-season-2-2017-1fhos2/play";

//MTS begin
$windowObj      = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow($url);
$windowObj->setUserAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0");

$selector = "[id='ep-1072070']";

if($windowObj->getSelectorExists($selector))
{
    echo "episode button exists";
    $timeout    = 20; //in seconds
    $selector1   = "[class^='jw-video']"; 
    $selector2   = "video"; 

    $tTime      = time() + $timeout;
    $pageReady  = null;
    while($pageReady === null) {
        if ($windowObj->getSelectorExists($selector1) === true || $windowObj->getSelectorExists($selector2) === true) {
            $pageReady  = true;
        } elseif (time() > $tTime) {
            //failed to get ready before timeout
            $pageReady  = false;
        }
    }

    if ($pageReady === false) {
        throw new \Exception("Selector: $selector1 or $selector2 , did not load in: " . $timeout . " seconds");
    } else {
        echo "selector ready to click";
    }
}
else
{
    echo "episode button not exists";
}

result:

episode button exists
Fatal error: Uncaught Exception: Selector: [class^='jw-video'] or video , did not load in: 20 seconds in /var/www/html/test/test4.php:33 Stack trace: #0 {main} thrown in /var/www/html/test/test4.php on line 33

i will try to solve this while waiting your help.

merlinthemagic commented 7 years ago

Hi,

First, in order to have the user agent have effect it must be set BEFORE the page is loaded, like this:

$windowObj      = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow();
$windowObj->setUserAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0");
$windowObj->setUrl($url);

Second, I looked briefly at the source page for the URL and I do not see the elements you are looking for. You might want to open the page in a regular browser and locate the tags/attributes there first.

Third, your class selector might not work as intended: [class^='jw-video'] It appears you are trying to find elements where the class attribute starts with: 'jw-video'. I am not sure what effect the ' are having and further treating the class attribute as if you were matching a string is unstable since you cannot guarantee "jw-video" appears first in a list of classes. Why not just use CSS selector for class like this: .jw-video? The other selector is looking for elements that have the "video" tag. That should be fine.

plonknimbuzz commented 7 years ago

ahh.. i'm happy you are trying to help me. indoxxi.com is actually my random web for learn MTS purpose. cz after some try, MTS is much easier to use better than other library (phantomJS, casperJS or slimmerJS). But i dont know why this web make me struggle.

======== before we go to indoxxi.com i will create dummy demo which like similar with this (create new element and grab the content)

url: http://creativecoder.xyz/test/

<script src="https://code.jquery.com/jquery-3.2.1.min.js"></script>
<script>
function showtext()
{
    $.ajax({
        url: 'process.php',
        type: 'GET',
        data: {name: $('input').val()},
        success: function(d){
            $('#show').append('<div id="show1">'+ d +'</div>');
        }
    })
}
</script>
<input type="text" value="John">
<button id="clickme" onclick="showtext()">click me</button>
<div id="show" >message will be here</div>

how this script work?

  1. there is no #show1 element yet
  2. after we click #clickme, div#show will contain new element <div id="show1">welcome john</div>
  3. now our target is grab #show1

let's do this programmatically/ automated this i doing this using phantomJS, slimmerJS, casperJS and MTS, and result is perfect. i will post casperJS and MTS script only for this demo. capserJS

var casper = require('casper').create();

casper.start('http://creativecoder.xyz/test/index.php', function() {
    if (this.exists('#clickme')) 
        this.echo('button exists');
    else
        this.echo('button not exists');
});

casper.then(function() {
   this.click('#clickme'); //click the button
});

casper.then(function(){
    casper.wait(3000, //wait
        function(){
            this.echo(this.getHTML('#show'));
            this.echo(this.getHTML('#show1')); //this is the target
        }
    );
});

casper.run();

MTS

<?php
ini_set('max_execution_time', 120);
require_once "/var/www/html/test/MTS/MTS/EnableMTS.php";
$url = "http://creativecoder.xyz/test/index.php";

$windowObj      = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow();
$windowObj->setUserAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0");
$windowObj->setUrl($url);

if($windowObj->getSelectorExists('#clickme'))
{
    echo "button exists". PHP_EOL;
    $windowObj->clickElement('#clickme');
    $timeout    = 3;
    $tTime      = time() + $timeout;
    $pageReady  = null;

    while($pageReady === null) {
        if ($windowObj->getSelectorExists("#show1") === true ) {
            $pageReady  = true;
        } elseif (time() > $tTime) {
            $pageReady  = false;
        }
    }

    if ($pageReady === false)
        echo "fail get #show1";
    else 
        echo $windowObj->getElement("#show1")['innerHTML'];
}
else
{
    echo "button not exists";
}

phantomJS, casperJS, slimmerJS, and MTS works perfect. they will return "welcome John";

Now we move to the real target

Note:

we back to the topic if you read my first question you will know that i explain how that website exactly works. i will use lastest chrome and firefox

  1. we open https://en.indoxxi.net/film-seri/stranger-things-season-2-2017-1fhos2/play
  2. open source: no <video class="jw-video jw-reset">....</video> yet.
  3. check with web console and try $('video').length or $('.jw-video').length return 0
  4. now we click 1 of episode button there. we can do this in 3 way. a. click "E01" button from browser (real mouse click) b. $('#ep-1').click() from console c. loadEpisode(1,1); from console all those will create
  5. check again with number 3. and now we got return 1 , which thats mean video element exists
  6. grab the source $('video').attr('src') and we will get "https://lh3.googleusercontent.com/wpbYsMBx7WHFrm_jSjZQrQVlPLusRF0njilMpyTlbg_YVNSffjfufAWht5_kvJu4BUDJRpWkeuw=m18"

Now we do it programmatically

==== i doing this with phantomjs, casperjs, slimmerjs and MTS. but all failed

capserJS

var casper = require('casper').create();

casper.start('https://en.indoxxi.net/film-seri/stranger-things-season-2-2017-1fhos2/play', function() {
    if (this.exists('#ep-1')) 
        this.echo('episode exists');

    if (this.exists('video')) 
        this.echo('video exists');
    else
        this.echo('video not exists');
});

casper.then(function() {
    if (this.exists('#ep-1'))
        this.click('#ep-1');
    else
        this.echo('episode not exists');
});

casper.then(function(){
    casper.wait(20000, //wait 20s after click
        function(){
            if (this.exists('video')) 
                this.echo(this.getHTML('video'));
            else
                this.echo('video not exists');

            if (this.exists('.jw-video')) 
                this.echo(this.getHTML('.jw-video'));
            else
                this.echo('class jw-video not exists');
        }
    );
});

casper.run();

result

episode exists
video not exists
video not exists
class jw-video not exists

MTS

<?php
ini_set('max_execution_time', 120);
require_once "/var/www/html/test/MTS/MTS/EnableMTS.php";
$url = "https://indoxxi.net/film-seri/stranger-things-season-2-2017-1fhos2/play";

$windowObj      = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow();
$windowObj->setUserAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0");
$windowObj->setUrl($url);

if($windowObj->getSelectorExists("#ep-1"))
{
    echo "episode button exists". PHP_EOL;
    $windowObj->clickElement("#ep-1"); //click it
    $timeout    = 20; //wait 20s
    $selector1   = "[class^='jw-video']"; 
    $selector2   = ".jw-video"; 
    $selector3   = "video"; 

    $tTime      = time() + $timeout;
    $pageReady  = null;
    while($pageReady === null) {
        if ($windowObj->getSelectorExists($selector1) === true || $windowObj->getSelectorExists($selector2) === true || $windowObj->getSelectorExists($selector3) === true) {
            $pageReady  = true;
        } elseif (time() > $tTime) {
            $pageReady  = false;
        }
    }

    if ($pageReady === false) {
        echo "all selector not found the element";
    } else {
        echo "video element found";
    }
}
else
{
    echo "episode button not exists";
}

result

episode button exists
all selector not found the element

All of those tools cant found video tag . WTF is this T_T

plonknimbuzz commented 7 years ago

after exploring several practice , i figure it out now

i screenshot before click and after click (wait time included) i attach both of them. (i use casperJS and MTS) both of them return same result

page initialized begin

after click and wait 20second end

if we see second images. both of casperJS and MTS successfully click on that button. But because not support flash/html5 video . there's no source to playable

the possible issue is : phantomJS not support flash/html5 video. (maybe) now, i want try youtube to know whats happen.

plonknimbuzz commented 7 years ago

i cant grab video from youtube too

$('video').src => before press play button
null
$('video').src => after press play button
"blob:https://www.youtube.com/be3f3460-7d27-4335-9b25-d916d9532c22"

even the source is not playable. i still cant get the source

then i test the browser using https://html5test.com/ to make sure that phantomJS browser not support html5 video casperJS browser2

MTS browser1

phantomJS on youtube youtube

not support Video, audio and streaming

so this not MTS issue, but phantomJS issue.

you can close this issue, if you are agree with me

merlinthemagic commented 7 years ago

Correct, there is no Flash support in phantomJS. A WebKit update was scheduled, but since Chrome headless came out the phantomJS project has been shutdown.

There is a new repo almost ready for release, it’s called MHIT and will among other things allow you to select Chrome instead of PhantomJS (same API).

merlinthemagic commented 7 years ago

Correct, there is no Flash support in phantomJS. A WebKit update was scheduled, but since Chrome headless came out the phantomJS project has been shutdown.

There is a new repo almost ready for release, it’s called MHIT and will among other things allow you to select Chrome instead of PhantomJS (same API).

plonknimbuzz commented 7 years ago

hi merlin. i solved my problem using slimmerJS since he using real firefox not phantomJS browser

NOTE: need firefox 38.x - 52.0 (above that will occur error)

var page = require("webpage").create();
page.open("https://en.indoxxi.net/movie/cars-3-2017-5l0i/play")
    .then(function(status){
         if (status == "success") {
             console.log("The title of the page is: "+ page.title);
            slimer.wait(10000); //need to wait video load
            var input = page.evaluate(function(){
                var a = document.getElementsByTagName('video')[0].src;
                return a;
            });
            console.log('c:'+input);
            phantom.exit();
         }
         else {
             console.log("Sorry, the page is not loaded");
         }
         page.close();
         phantom.exit();
    })

result:

The title of the page is: Nonton Film Cars 3 (2017) Subtitle Indonesia | XX1
c:https://storage.googleapis.com/staging.europe-west-183009.appspot.com/monako/H
ot/Cars.3.2017.READNFO.720p.WEB-DL.X264.AC3-EVO.mp4

my real target

var page = require('webpage').create();
page.open('https://indoxxi.net/film-seri/stranger-things-season-2-2017-1fhos2/play');
slimer.wait(5000);
var rect = page.evaluate(function() {
    $('#ep-1').click();
});
slimer.wait(15000);
var rect = page.evaluate(function() {
    return $('video').attr('src');
});
console.log(rect);
phantom.exit();

result

https://lh3.googleusercontent.com/wpbYsMBx7WHFrm_jSjZQrQVlPLusRF0njilMpyT
lbg_YVNSffjfufAWht5_kvJu4BUDJRpWkeuw=m18

my question is, why you not using real browser?

  1. selenium using real chrome and firefox
  2. slimmerJS using real firefox (and not only video, audio tag, but they said support flash too)
  3. if you still want to use phantomJS, you need to build it manually. read this https://github.com/ariya/phantomjs/issues/10839#issuecomment-331457673

i really waiting MHIT if you said that will use real chrome i love MTS , because MTS is really - really easier to use better than others. especially selenium which most hard to learn XD