isearch-gp / gp-flask

Google Proxy Flask (like Googler) but with BeautifulSoup
https://gp-python.herokuapp.com/search?hl=en&gl=us&ie=UTF-8&q=cats+video
0 stars 0 forks source link
beautifulsoup flask flask-api python scraper web webscraping

gp-flask

Google Proxy Flask API using Python, Response and BeautifulSoup

GitHub tag Code Style Linted Known Vulnerabilities Security Scanner Website

I originally tried to "port" Googler to an API but found it much easier to do the web scraping myself. Still need to add a lot of functionality (see ToDo below).

This proxy also displays web and raw web output (for debug)

Usage:

lucky.py - Python Web Scaping API in Flask

        Options:
        -h   --help       this message
        -v N --verbose=N  verbose output

                 0 = Info
                 3 = JSON payload counts
                 5 = JSON payload elements
                 6 = raw JSON payload

Python Dev setup

activate Virtual ENV (venv)/workon hello

C:\Users\x\Documents\GitHub\gp-flask>.\venv\Scripts\activate

C:\Users\x\Documents\GitHub\gp-flask>workon hello

deactivate

(hello) C:\Users\x\Documents\GitHub\gp-flask>deactivate

run the Flask app

(hello) C:\Users\x\Documents\GitHub\gp-flask>python lucky.py

Endpoints to test

Show response as web page (Raw HTML - what Google returns) http://localhost:5000/raw?q=malpractice

Show response as web page (from parsed response data) http://localhost:5000/search?q=malpractice

Send response as JSON (for API) http://localhost:5000/json?q=malpractice

This can also be done interactivaly with Python on the command line:

(hello) C:\Users\x\Documents\GitHub\gp-flask>python

>>> import requests
>>> response = requests.get("http://127.0.0.1:5000/json?q=malpractice")
>>> response.json()

or with cURL:

curl http:///127.0.0.1:5000/json?q=malpractice

Check indent in py files before checkin

python -m tabnanny lucky.py

Advanced Topics (ToDo)

Scraper stuff

Service stuff

Links

http://timmyreilly.azurewebsites.net/python-pip-virtualenv-installation-on-windows/

https://blog.hartleybrody.com/web-scraping-cheat-sheet/

More here: Iterative Search