Open codehead opened 10 months ago
I'm out of town and reading on my phone so I'm not going to do a full review, but I think I agree with @kraih there are things in there that don't pass the smell test
Use of ->json is optional. In my opinion if you have a legacy system with nonstandard JSON encoding, you should just use from_json(decode($charset, $msg->body))
Use of ->json is optional. In my opinion if you have a legacy system with nonstandard JSON encoding, you should just use
from_json(decode($charset, $msg->body))
Thank you @Grinnz . With all due respect, use of ->text is also optional, and it does take into account the charset specified in the message and falls back to a reasonable value should decoding fail. And that syntactic sugar is very, very welcome. I hope the same can be achieved with ->json in the name of cross-framework, cross-language, cross-charset interoperability.
The thing is, you don't know you are dealing with this issue (I still refuse it to call it a bug) until it bites you. On a random message. That might be irreproducible. From a system you don't control. IMHO it would be more useful to have json() die loudly upon trying to decode a non-UTF-8 message than a silent failure.
Use of ->json is optional. In my opinion if you have a legacy system with nonstandard JSON encoding, you should just use
from_json(decode($charset, $msg->body))
Thank you @Grinnz . With all due respect, use of ->text is also optional, and it does take into account the charset specified in the message and falls back to a reasonable value should decoding fail. And that syntactic sugar is very, very welcome. I hope the same can be achieved with ->json in the name of cross-framework, cross-language, cross-charset interoperability.
I don't see the comparison. The only purpose of ->text is to decode from the specified charset. The only purpose of ->json is to decode from spec-compliant JSON.
The thing is, you don't know you are dealing with this issue (I still refuse it to call it a bug) until it bites you. On a random message. That might be irreproducible. From a system you don't control. IMHO it would be more useful to have json() die loudly upon trying to decode a non-UTF-8 message than a silent fail.
There are plenty of issues one may run into in real world use cases, the framework has to balance predicting these with making the common and compliant case simple and efficient, in this case calling two functions seems like a simple option compared to complicating and slowing down the ->json method for all users in the common case where you receive JSON.
Use of ->json is optional. In my opinion if you have a legacy system with nonstandard JSON encoding, you should just use
from_json(decode($charset, $msg->body))
Thank you @Grinnz . With all due respect, use of ->text is also optional, and it does take into account the charset specified in the message and falls back to a reasonable value should decoding fail. And that syntactic sugar is very, very welcome. I hope the same can be achieved with ->json in the name of cross-framework, cross-language, cross-charset interoperability.
I don't see the comparison. The only purpose of ->text is to decode from the specified charset. The only purpose of ->json is to decode from spec-compliant JSON.
The thing is, you don't know you are dealing with this issue (I still refuse it to call it a bug) until it bites you. On a random message. That might be irreproducible. From a system you don't control. IMHO it would be more useful to have json() die loudly upon trying to decode a non-UTF-8 message than a silent fail.
There are plenty of issues one may run into in real world use cases, the framework has to balance predicting these with making the common and compliant case simple and efficient, in this case calling two functions seems like a simple option compared to complicating and slowing down the ->json method for all users in the common case where you receive JSON.
I concur with @Grinnz, there is a difference in that "text" is specified to be in a known (configured) charset, but JSON is by definition utf-8 encoded. As such, bytes that are json-like but encoded as some other charset are technically not actually JSON. I know that's not a very satisfactory answer, and it sounds dismissive, but given the difference there it seems not unreasonable to make the spec-compliant mechanism as simple and fast as possible while still making your use-case possible, if not quite as convenient. It seems that that is already the case.
I'm afraid it looks like this PR will not pass the vote.
Summary
This PR modifies Mojo::Message to allow JSON decoding when a charset is specified. Currently all messages are decoded as UTF-8, so JSON messages with other encodings may fail silently.
Motivation
Mojo::UserAgent always tries to decode messages in UTF-8 ignoring the message character set. When a message does not meet the UTF-8 specification,
Mojo::JSON::json_decode()
fails silently andMojo::UserAgent::json()
returnsundef
. In character sets with a large overlap with UTF-8 such as ISO-8859-1, message decoding fails only when accented characters are present, so JSON messages migh seem empty at random. This is critical to interface with legacy systems that expose JSON messages in charsets different from UTF-8.References
No public issues or PRs for Mojolicious as far as I'm aware. Other Perl frameworks allow non-UTF-8 charsets for JSON messages from very early on -- e.g. Catalyst::View::JSON. Some Java frameworks defaulted to ISO-8859-1 encoding as recently as 2020: Content Type being append with charset=ISO-8859-1 #1428. Even though RFC4627 tried to stardardize UTF-8 encoding for JSON messages as far back as 2006 (at around the same time some frameworks implemented charset specification to allow non-ascii characters in JSON) RFC8259's language leaves enough slack for systems that are part of a closed ecosystem.