mdsol / rwslib

Provide a (programmer) friendly client library to Rave Web Services (RWS).
MIT License
31 stars 13 forks source link

Consider improving the default encoding in rwslib library #115

Closed halbow closed 3 months ago

halbow commented 3 years ago

Hello :wave:

First of all, thanks for publishing this library on Github, it's very helpful ! :+1:

I saw the work here to add the possibility to specify the encoding of the response. Do you think the default encoding could be specified in rwslib ?

XML encoding should specified in the body of the XML (and not in the content type) but some response (such as AuditRecordsRequest) are missing the XML declaration. In this case, the requests library used in rwslib is decoding "text/xml" without encoding as "ISO-8859-1" in order to follow RFC2616 which has been made obsolete by RFC 7231.

This leads to wrongly decoded character in our case and there's no real way for the client to know which encoding to use (in our case using utf-8 is solving the issue, but I think we could improve the default in rwslib as it is targeting the Rave Web Services which probably used the same encoding for most response). My suggestion would be to try to read the encoding in the XML declaration, and fallback to utf-8 (as it's the most widely used ) or Response.appareant-encoding to try to detect it automatically.

I can propose a pull requests with the changes if needed

Regards

iansparks commented 3 years ago

Personally I'm open to this but my impression is that the encoding you get back is a bit of a gamble. I think it's fixed for each endpoint but, for example, Subject listings are utf-8-sig (i.e. with a BOM). Reading from the XML is probably a good approach but I'm not using this library daily any more so hesitant to agree changes that could burn people who are.

iansparks commented 3 years ago

@halbow Thanks for this offer. I have had a chance to read your proposal again and I think it makes sense. @glow-mdsol any thoughts?

halbow commented 3 years ago

I just read a bit more about the XML specification here and considering utf-8 as the default when there's no BOM or info in the XML declaration seems safe: In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Now from a library maintenance POV, I understand that you don't want to change the behavior silently, what we could do is to create Another type of Request, something like AuditRecordsRequestUTF8, and deprecate the previous one and only do the change when the library is bumped to a major version.

glow-mdsol commented 3 years ago

I think the issue we've historically had has been down to inconsistencies in how RWS generates the responses (it's not done universally and it's probably not changing anytime soon). Please do raise a PR and we can work off that. I can also see the RWS code so add a look behind the curtain, so to speak.