scrapehero / selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
MIT License
65 stars 11 forks source link

Specifying a "type" other than Text, Link, HTML, Attribute or Image (even the same ones in different casing) will yield an UnboundLocalError #84

Open ghost opened 1 year ago

ghost commented 1 year ago

Description

Specifying a "type" in YAML other than Text, Link, HTML, Attribute or Image (even the same ones in different casing) yields an UnboundLocalError for "content" variable. A quick inspection through the source code shows a missing "else" branch and as a result "content" is never defined.

What I Did

Used type: "Html" in YAML and ran extractor.extract()

Traceback (most recent call last):
  File "scraper.py", line 18, in <module>
    print(extractor.extract(r.text))
  File "/home/kartik/dev/selectorlib-projects/demo/venv/lib/python3.8/site-packages/selectorlib/selectorlib.py", line 74, in extract
    fields_data[selector_name] = self._extract_selector(selector_config, sel)
  File "/home/kartik/dev/selectorlib-projects/demo/venv/lib/python3.8/site-packages/selectorlib/selectorlib.py", line 93, in _extract_selector
    value = self._get_child_item(field_config, element)
  File "/home/kartik/dev/selectorlib-projects/demo/venv/lib/python3.8/site-packages/selectorlib/selectorlib.py", line 113, in _get_child_item
    child_value = self._extract_selector(children_config[field], element)
  File "/home/kartik/dev/selectorlib-projects/demo/venv/lib/python3.8/site-packages/selectorlib/selectorlib.py", line 100, in _extract_selector
    value = extract_field(element, item_type, **kwargs)
  File "/home/kartik/dev/selectorlib-projects/demo/venv/lib/python3.8/site-packages/selectorlib/selectorlib.py", line 21, in extract_field
    return content
UnboundLocalError: local variable 'content' referenced before assignment

I would be more than happy to submit a PR to fix this.

sumeshmurali commented 1 year ago

Hi @ashwinrajeev, I have fixed this issue and created the following PR - https://github.com/scrapehero/selectorlib/pull/86. Please verify and let me know if any changes need to be made.