kadnan / ScrapeGen

A simple python tool that generates a requests/bs4 based web scraper
MIT License
26 stars 3 forks source link

Use jinja2 #1

Open hughdbrown opened 4 years ago

hughdbrown commented 4 years ago

The code presented is a mixture of argument parsing, code generation, writing to file. Most of the logic here could be replaced by use of the python template library, jinja2.

kadnan commented 4 years ago

Isn't Jinja2 for Web? Not sure why would you recommend it.

hughdbrown commented 4 years ago

Jinja2 is for any text generation, including code generation.

Your README.md is more or less this:

from jinja2 import Template
from collections import namedtuple

template = Template('''
from bs4 import BeautifulSoup
import requests

{% for fn in fns %}
    def get_{{ fn.name }}(soup_obj):
        {{ fn.name }}_selection = soup.obj(fn.css_sel)
        _{{ fn.name }} = next({{ fn }}_selection, None)
        return _{{ fn.name }} and _{{ fn.name }}.text.strip()
{% endfor %}

def parse(url):
    r = requests.get(url)
    if r.status_code == 200:
        html = r.text.strip()
        soup = BeautifulSoup(html)
{% for fn in fns %}
        {{ fn.name }} = get_{{ fn.name}}(soup)
{% endfor %}

if __name__ == '__main__':
    parse({{ url }})
''')

Func = namedtuple("Func", ["name", "css_sel"])
result = template.render({
     "url": "https://www.olx.com.pk/item/1-kanal-brand-bew-banglow-available-for-sale-in-wapda-town-iid-1009971253",
     "fns": [
         Func("price", "#container > main > div > div > div.rui-2SwH7.rui-m4D6f.rui-1nZcN.rui-3CPXI.rui-3E1c2.rui-1JF_2 > div.rui-2ns2W._2r-Wm > div > section > span._2xKfz"),
         Func("seller", "#container > main > div > div > div.rui-2SwH7.rui-m4D6f.rui-1nZcN.rui-3CPXI.rui-3E1c2.rui-1JF_2 > div.rui-2ns2W.YpyR- > div > div > div._1oSdP > div > a > div"),
     ]
})
print(result)

If you changed the template text from being inline to being in an external file, the entire program would be:

kadnan commented 4 years ago

Interesting. I have no plan to change this atm. Maybe I use your approach in future work. You are also welcome to fork it.