thoughtbot / capybara-webkit

A Capybara driver for headless WebKit to test JavaScript web apps
https://thoughtbot.com/open-source
MIT License
1.97k stars 428 forks source link

Broken charset on page without meta charset #1024

Closed rustamakhmetov closed 6 years ago

rustamakhmetov commented 7 years ago

Sorry for my English.

I am testing external Cyrillic site without meta charset in the header. The response body contains chars with the broken charset, e.g: "Список по" of instead "Список покупок". test code

jferris commented 7 years ago

We try to force the response as UTF-8 in Ruby: https://github.com/thoughtbot/capybara-webkit/blob/1617ee424c6b62177ad836126f83041999d6e986/lib/capybara/webkit/browser.rb#L339

It's possible that we also need to do something on the QtWebKit side - it may be guessing a different charset. My best guess is that we could use this: http://doc.qt.io/archives/qt-5.5/qwebsettings.html#setDefaultTextEncoding

twalpole commented 6 years ago

This behavior is correct, and will happen if you visit a page with your test cases given source in Chrome and Firefox too. When no charset is specified it's up to the browser to pick one, and both Chrome and Firefox (set to US-English, maybe other language versions would default differently) don't default to utf-8 for your given document. You either need to include the meta charset tag, or specify the charset in the Content-Type header returned with the document (or escape all those characters I guess which sounds ridiculous)

'Content-Type' => 'text/html; charset=utf-8'