jeanphix / Ghost.py

Webkit based scriptable web browser for python.
http://ghost-py.readthedocs.org/en/latest/
2.77k stars 380 forks source link

Simple AJAX request #272

Open stdex opened 9 years ago

stdex commented 9 years ago

Hello. I'm try to emulate simple ajax request.

Scheme: image

When I clicked on button, It must do GET request to server and return json, then render response information on page. I'm use wait_for_selector method for wait, when I should get response from ajax request, but it's not updated in DOM. Can someone help me with it?

My code:

from bs4 import BeautifulSoup
from ghost import Ghost

work_url = "https://m.avito.ru/sankt-peterburg/predlozheniya_uslug/almaznoe_burenie_almaznaya_rezka_usilenie_79225740"

ghost = Ghost()
with ghost.start() as session:
    page, extra_resources = session.open(work_url, timeout=100)
    session.click(".action-show-number",0)
    session.wait_for_selector(".button-green")
    soup = BeautifulSoup(page.content, "lxml")
    phone = soup.find('a', {"class": "action-show-number"}).attrs['href']
    print(phone)

Also, I have problem with this code. Sometimes it return: Segmentation fault (core dumped) Where I can find logs about it fault?

Ubuntu 14.04, x86
Python 3.4
Ghost.py 0.2.3
yddchsc commented 8 years ago

Hello. I also have this problem,and I have tried to salve it for several hours. Could you tell me how to do? Thank you.

stdex commented 8 years ago

I refused to use Ghost.py because there are many problems in it, and I'm unable to fix them or help to do something. Recently, I try to use PhantomJS (headless webkit) and python wrapper for it, example of use:

#!/usr/bin/env python

import re
from urllib.parse import urljoin
from urllib.parse import urlparse

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

link = 'https://m.avito.ru/sankt-peterburg/predlozheniya_uslug/almaznoe_burenie_almaznaya_rezka_usilenie_79225740'

class AvitoScraper(object):
    def __init__(self):
        self.driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true'])
        self.driver.set_window_size(1120, 550)

    def scrape_phone(self):
        self.driver.get(link)
        sleep(1)

        self.driver.find_element_by_class_name("action-show-number").click()
        sleep(1)

        s = BeautifulSoup(self.driver.page_source, "lxml")
        phone = s.find('a', {"class": "action-show-number"}).attrs['href']
        print(phone)

    def scrape(self):
        self.scrape_phone()
        self.driver.quit()

if __name__ == '__main__':
    scraper = AvitoScraper()
    scraper.scrape()
ghost commented 8 years ago

You didn't enumerate those problems, maybe someone could help you. The only thing I can't do now is uploading files that accept multiple uploads, others, till I discover another one, I've surmounted.

I used to work with CasperJS but I wanted something very cool and Ghost.py filled the need. As for me, right now, I stand with Ghost.py.

I've also tried the Selenium with PhantomJS but it cringes on some sites. To fill my needs, I've had to retouch Ghost.py in some places, maybe that's what you could have done too. On 2 Apr 2016 20:26, "Rostunov Sergey" notifications@github.com wrote:

I refused to use Ghost.py because there are many problems in it, and I'm unable to fix them or help to do something. Recently, I try to use PhantomJS (headless webkit) and python wrapper for it, example of use:

!/usr/bin/env python

import re from urllib.parse import urljoin from urllib.parse import urlparse

from selenium import webdriver from bs4 import BeautifulSoup from time import sleep

link = 'https://m.avito.ru/sankt-peterburg/predlozheniya_uslug/almaznoe_burenie_almaznaya_rezka_usilenie_79225740'

class AvitoScraper(object): def init(self): self.driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) self.driver.set_window_size(1120, 550)

def scrape_phone(self):
    self.driver.get(link)
    sleep(1)

    self.driver.find_element_by_class_name("action-show-number").click()
    sleep(1)

    s = BeautifulSoup(self.driver.page_source, "lxml")
    phone = s.find('a', {"class": "action-show-number"}).attrs['href']
    print(phone)

def scrape(self):
    self.scrape_phone()
    self.driver.quit()

if name == 'main': scraper = AvitoScraper() scraper.scrape()

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jeanphix/Ghost.py/issues/272#issuecomment-204785371

yddchsc commented 8 years ago

Thank you for your answers.

And I have found a way to solve it. Maybe the codes are not very beautiful and the way is not perfect.

Following is the site :

site

And the link is:

site

I need to click the link to turn to next page.

The main part of codes is:
‘’‘python

    def parse_geguyanbao(self,response):
            item = response.meta['item']
            day = {}
            g = ghost.Ghost()
            with g.start() as session:
                session.display = True
                session.wait_timeout = 999
                session.download_images = False
                page, extra_resources = session.open(response.url)
                page, extra_resources = session.wait_for_page_loaded()
                response = response.replace(body=session.content)
                lo = 1
                loo = []
                while not loo:
                    dates, extra_resources = session.evaluate("""
                    (function () {
                        var i = 0;
                        for (i = 0; i < %s; i++){
                            var element = document.querySelector(%s);
                            var evt = document.createEvent("MouseEvents");
                            evt.initMouseEvent("click", true, true, window, 1, 1, 1, 1, 1,false, false, false, false, %s, element);
                            element.dispatchEvent(evt);
                        }
                        elems = document.getElementById('dt_1').getElementsByTagName('ul');
                        var dates = [];
                        for (i = 0; i < elems.length; i++) {
                            dates[i] = elems[i].getElementsByTagName('li')[0].innerText;
                        }
                        return dates;
                    })();
                    """ % (str(lo), repr('#PageCont > a:nth-child(9)'), str(0)))
                    page, extra_resources = session.wait_for_page_loaded()
                    response = response.replace(body=session.content)
                    session.show()
                    session.sleep(1)
                    if dates:
                          sStr1 = str(dates[0])
                    else:
                          sStr1 = "null"
                    geguyanbao = 0
                   for a in day:
                       if cmp(a,sStr1) == 0:
                            geguyanbao = day[str(sStr1)]
                            break
                    for data in dates:
                        if cmp(data,sStr1) == 0:
                            geguyanbao = geguyanbao + 1
                        else:
                            day[str(sStr1)] = geguyanbao
                            k = 0
                            for key in day:
                                k = k + 1
                            if k >= 30:
                                item['geguyanbao'] = day
                                return item
                                break
                            geguyanbao = 1
                            sStr1 = data
                    day[str(sStr1)] = geguyanbao
                    lo = lo + 1
                    if Selector(response).xpath('//*[@id="PageCont"]/a[@class="nolink"]').extract():
                        loo = False
                    else:
                        loo = True
                item['geguyanbao'] = day
                return item

'''

ghost commented 8 years ago

I do not understand the Russian letters but I cooked up something for you to at least stop the crashing. Please focus on the last 2 lines and a newer line with Session added. This is what has worked for me to stop crashes in sites that crash all the time.

Please not that I still got errors but this time,

ghost.ghost.TimeoutError: Can't find element matching ".button-green"

from bs4 import BeautifulSoup
from ghost import Ghost, Session

work_url = "https://m.avito.ru/sankt-peterburg/predlozheniya_uslug/almaznoe_burenie_almaznaya_rezka_usilenie_79225740"
timeout = 100

ghost = Ghost()

with ghost.start():
    session = Session(ghost, display=True, wait_timeout=timeout)

    page, extra_resources = session.open(work_url, timeout=timeout)
    assert page.http_status == 200

    session.click(".action-show-number", 0)
    session.wait_for_selector(".button-green")

    soup = BeautifulSoup(page.content, "lxml")
    phone = soup.find('a', {"class": "action-show-number"}).attrs['href']
    print(phone)

    session.webview.setHtml('')
    session.exit()