jmcarp / robobrowser

BSD 3-Clause "New" or "Revised" License
3.7k stars 337 forks source link

Is there any way to set page encoding manualy #72

Open ghostku opened 7 years ago

ghostku commented 7 years ago

Sometimes robobrwser get wrong encoding for my page. I know that BeautifulSoup supports manual encoding definition can I set up encoding manually and passed it to BeautifullSoup? In my case it founds windows-1252 instead of UTF-8

And if I use just BeautifullSoup with requests - it works fine

>>> from bs4 import BeautifulSoup
>>> from robobrowser import RoboBrowser
>>> import requests
>>> url = 'http://10x10.com.ua/televizor-bravis-led-32d3000-smart-t2-black-v-dnepropetrovske.html'
>>> rb = RoboBrowser(parser='lxml')
>>> rb.open(url)
>>> rb.select('.product-name h1').pop()
<h1>\xd1\u201a\xd0\xb5\xd0\xbb\xd0\xb5\xd0\xb2\xd0\xb8\xd0·\xd0\xbe\xd1\u20ac Bravis LED-32D3000 Smart +T2 black \xd0\xb2 \xd0\u201d\xd0\xbd\xd0\xb5\xd0\xbf\xd1\u20ac\xd0\xbe\xd0\xbf\xd0\xb5\xd1\u201a\xd1\u20ac\xd0\xbe\xd0\xb2\xd1\x81\xd0\xba\xd0\xb5</h1>
>>> bs = BeautifulSoup(requests.get(url).text, 'lxml')
>>> bs.select('.product-name h1').pop()
<h1>телевизор Bravis LED-32D3000 Smart +T2 black в Днепропетровске</h1>
>>>