meyt / linkpreview

Get link preview in python
MIT License
46 stars 9 forks source link

Outdated example on README #8

Closed rickerp closed 3 years ago

rickerp commented 3 years ago

When I upgraded to the last version 0.2.0 it broke my function. It can easily be reproduced my running the Advanced example in README.md

Reproduce

from linkpreview import Link, LinkPreview, LinkGrabber

url = "http://github.com"
grabber = LinkGrabber(
    initial_timeout=20, maxsize=1048576, receive_timeout=10, chunk_size=1024,
)
content = grabber.get_content(url)
link = Link(url, content)
preview = LinkPreview(link, parser="lxml")

Error (example)

Backend couldn't get the preview for https://linkedin.com, fetching from linkpreviewer
Traceback (most recent call last):
  File "/backend/app/api/v1/endpoints/misc.py", line 26, in url_preview
    preview = get_link_preview(url, request.headers)
  File "/backend/app/helpers/link_preview.py", line 31, in get_link_preview
    preview = LinkPreview(link, parser="lxml")
  File "/usr/local/lib/python3.8/site-packages/linkpreview/preview.py", line 164, in __init__
    self.generic = Generic(link, parser)
  File "/usr/local/lib/python3.8/site-packages/linkpreview/preview.py", line 14, in __init__
    self._soup = BeautifulSoup(self.link.content, parser)
  File "/usr/local/lib/python3.8/site-packages/bs4/__init__.py", line 342, in __init__
    for (self.markup, self.original_encoding, self.declared_html_encoding,
  File "/usr/local/lib/python3.8/site-packages/bs4/builder/_lxml.py", line 186, in prepare_markup
    for encoding in detector.encodings:
  File "/usr/local/lib/python3.8/site-packages/bs4/dammit.py", line 301, in encodings
    self.declared_encoding = self.find_declared_encoding(
  File "/usr/local/lib/python3.8/site-packages/bs4/dammit.py", line 378, in find_declared_encoding
    declared_encoding_match = xml_re.search(markup, endpos=xml_endpos)
TypeError: expected string or bytes-like object

Versions

meyt commented 3 years ago

@rickerp We have a breaking change on last version (0.2.0) that seems didn't menioned well. sorry for that.

Updated:

from linkpreview import Link, LinkPreview, LinkGrabber

url = "http://github.com"
grabber = LinkGrabber(
    initial_timeout=20, maxsize=1048576, receive_timeout=10, chunk_size=1024,
)
content, url = grabber.get_content(url)
link = Link(url, content)
preview = LinkPreview(link, parser="lxml")