Broken charset on page without meta charset

rustamakhmetov commented 7 years ago

Sorry for my English.

I am testing external Cyrillic site without meta charset in the header. The response body contains chars with the broken charset, e.g: "Ð¡Ð¿Ð¸ÑÐ¾Ðº Ð¿Ð¾" of instead "Список покупок". test code

jferris commented 7 years ago

We try to force the response as UTF-8 in Ruby: https://github.com/thoughtbot/capybara-webkit/blob/1617ee424c6b62177ad836126f83041999d6e986/lib/capybara/webkit/browser.rb#L339

It's possible that we also need to do something on the QtWebKit side - it may be guessing a different charset. My best guess is that we could use this: http://doc.qt.io/archives/qt-5.5/qwebsettings.html#setDefaultTextEncoding

twalpole commented 6 years ago

This behavior is correct, and will happen if you visit a page with your test cases given source in Chrome and Firefox too. When no charset is specified it's up to the browser to pick one, and both Chrome and Firefox (set to US-English, maybe other language versions would default differently) don't default to utf-8 for your given document. You either need to include the meta charset tag, or specify the charset in the Content-Type header returned with the document (or escape all those characters I guess which sounds ridiculous)

'Content-Type' => 'text/html; charset=utf-8'

thoughtbot / capybara-webkit

Broken charset on page without meta charset #1024