Closed asmaier closed 3 years ago
@asmaier
You can access the domain through preview.link.netloc
. its not neccessary to add new property for the LinkPreivew
object.
About the short urls
, you need to extend the LinkGrabber and give the last redirected URI to LinkPreview, no need to extra request.
Fetching URL has more scenarios to handle, LinkPreview is focusing on Parsing the results and Grabber part is just helper for common use cases. (i've to mention it on README 😄)
You may need something like this for now:
import requests
from linkpreview import Link, LinkPreview
url = 'http://g.co/blob-opera';
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
req = requests.get(url, headers=headers)
preview = LinkPreview(Link(req.url, req.text))
print(preview.link.netloc) # output: artsandculture.google.com
At the moment Linkpreview returns a preview object with information about title, description and image. I suggest to also return the real domain name. This is interesting information, especially when getting a link preview of short urls.
I found a workaround at the moment. However the disadvantage is, that one has to make two requests to an url to get all information:
It would be much nicer, if the preview object would hold the information about the domain directly, e.g. in the field
preview.domain
.