secondlife / jira-archive

2 stars 0 forks source link

[BUG-227992] http_response requires "charset=utf-8" #6163

Open sl-service-account opened 4 years ago

sl-service-account commented 4 years ago

What just happened?

As per the wiki () "When serving content with UTF-8 characters be sure your server sets the outgoing Content-Type header so that it includes charset=utf-8 otherwise it will be interpreted incorrectly. See W3C:Setting the HTTP charset parameter for further details."

This means that "&" will be returned as "&" in the response body if charset=utf8 is not specified in the content header. This is fine if you are the owner of the server providing the response but not if you aren't, meaning a lot of external data (from for example Japanese servers) output rubbish. No other web consumers seem to be this fussy about return headers, so neither should SL. There is no way to unencode all these characters within LSL so this is a MAJOR issue for those trying to get external information from servers that do not routinely provide the charset. LSL should accept uft-8 by default instead or assuming an archaic ANSII encoding.

What were you doing when it happened?

standard http_response behaviour

What were you expecting to happen instead?

http_response should default to utf-8

Other information

Original Jira Fields | Field | Value | | ------------- | ------------- | | Issue | BUG-227992 | | Summary | http_response requires "charset=utf-8" | | Type | Bug | | Priority | Unset | | Status | Accepted | | Resolution | Accepted | | Reporter | Dancing Lemon (dancing.lemon) | | Created at | 2019-12-09T16:54:39Z | | Updated at | 2019-12-11T11:22:03Z | ``` { 'Build Id': 'unset', 'Business Unit': ['Platform'], 'Date of First Response': '2019-12-09T12:26:33.688-0600', 'ReOpened Count': 0.0, 'Severity': 'Unset', 'System': 'SL Simulator', 'Target Viewer Version': 'viewer-development', 'What just happened?': 'As per the wiki () "When serving content with UTF-8 characters be sure your server sets the outgoing Content-Type header so that it includes charset=utf-8 otherwise it will be interpreted incorrectly. See W3C:Setting the HTTP charset parameter for further details."\r\n\r\nThis means that "&" will be returned as "&" in the response body if charset=utf8 is not specified in the content header. This is fine if you are the owner of the server providing the response but not if you aren\'t, meaning a lot of external data (from for example Japanese servers) output rubbish.\r\nNo other web consumers seem to be this fussy about return headers, so neither should SL. There is no way to unencode all these characters within LSL so this is a MAJOR issue for those trying to get external information from servers that do not routinely provide the charset.\r\nLSL should accept uft-8 by default instead or assuming an archaic ANSII encoding.', 'What were you doing when it happened?': 'standard http_response behaviour', 'What were you expecting to happen instead?': 'http_response should default to utf-8', 'Where': 'LSL scripting', } ```
sl-service-account commented 4 years ago

Dancing Lemon commented at 2019-12-09T17:05:06Z

Just a point to note - even your own website doesn't include this charset in it's responses (example: http://world.secondlife.com/group/5d015d53-090c-f50d-b8a8-8895c533175b ).  This group name contains an ampersand, but is not returned correctly to http_response, meaning lsl scripters have to muck about trying to replace these characters once retrieved.  

(◔_◔) 

sl-service-account commented 4 years ago

Caleb Linden commented at 2019-12-09T18:26:34Z

Thanks for the report, we are looking into this jira.

sl-service-account commented 4 years ago

Dancing Lemon commented at 2019-12-11T11:22:03Z

Hi Caleb - many thanks for giving this your attention.  This table appears to contain a correct list of how the characters are appearing, and also three possible causes for the problem:  https://www.i18nqa.com/debug/utf8-debug.html