Closed yaseenox-personal closed 8 years ago
Which url are you trying to parse?
I tried more than one article, this one for example: 'http://www.bbc.com/news/business-37618618'
Hm, it looks like they're setting meta
description as well as og:description
and twitter:description
You won't be able to extract the article with lassie
but you will be able to get the description. I'll see if I can fix this.
This is fixed, version 0.8.3 is now available on pypi!
The site you posted now returns:
{
'site_name': u 'BBC News',
'description': u 'Samsung has ceased production of its Galaxy Note 7 smartphones after reports of devices it had deemed safe catching fire.',
'videos': [],
'title': u 'Samsung permanently stops Galaxy Note 7 production',
'url': u 'http://www.bbc.com/news/business-37618618',
'status_code': 200,
'locale': u 'en_GB',
'images': [{
'src': 'http://www.bbc.com/news/business-37618618',
'height': None,
'width': None
}, {
'src': u 'http://ichef.bbci.co.uk/news/1024/cpsprodpb/577B/production/_91759322_h2h1xuaj.jpg',
'type': u 'og:image'
}]
}
Hi, I want to extract the article from the source url. I got only the title of the article and small parts of it under the "description" parameter.