mysqljs / mysql

A pure node.js JavaScript Client implementing the MySQL protocol.
MIT License
18.22k stars 2.53k forks source link

Throwing on an invalid charset seems a little much for a minor release #807

Closed tgriesser closed 10 years ago

tgriesser commented 10 years ago

Ticket #789 added an error thrown when an invalid charset is specified.

Is this a feature to throw? Would doing a a console.error with a message or something along those lines make more sense?

I had for awhile assumed that utf8 was a valid choice for a charset since it's what I would have used in mysql, the docs even have > charset utf8; in multiple examples.

So I'd advertised it in the docs for knex.js as the default for connection settings example, and I'm sure there are a lot of people who still have that as a config option, for example here in the ghost blogging platform.

I'm just afraid this might cause more breakage than it's worth for a minor release, particularly now that the ^ is the default for npm install --save, and since as far as I can tell from mysql docs, utf8 makes sense as a valid value for a charset.

dougwilson commented 10 years ago

Would doing a a console.error with a message or something along those lines make more sense?

We really won't want to spew out random to people's stderr/stdout pipes.

Is this a feature to throw?

The "feature" is that people had no idea what the valid values were, so it is very dangerous to specify an unknown value, as it would do something you didn't intend silently. Since throwing is a "feature" it was introduced in 2.2, which is according to semver conventions.

I had for awhile assumed that utf8 was a valid choice for a charset since it's what I would have used in mysql

That may be, but the docs for this module showed the valid was "UTF8_GENERAL_CI" and required to be in caps since v2.0.0-alpha2 and clarified it needed to be in caps (which is now no longer the case) since v2.0.0-alpha9 (this is all before v2 was even released). It was documented correctly, but the code was incorrectly ignoring bad values.

the docs even have > charset utf8; in multiple examples.

Those docs are for SET calls in MySQL queries, not for the protocol. The MySQL docs for the actual protocol charset is here: http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html

This client calls it "charset" because that it what it is called on the protocol level (http://dev.mysql.com/doc/internals/en/connection-phase-packets.html#packet-Protocol::Handshake), but it is technically the collation that the protocol is using.

particularly now that the ^ is the default for npm install --save

Perhaps, but not throwing was a bug in the prior releases; all bug fixes are technically breaking, which does suck.

dougwilson commented 10 years ago

Forgot to address this line:

as far as I can tell from mysql docs, utf8 makes sense as a valid value for a charset.

Remember that "charset" is what the protocol calls it, but what is negotiated is the collation, as in there is no binary value that corresponds to "utf8"; you can see the full list here: https://github.com/felixge/node-mysql/blob/v2.2.0/lib/protocol/constants/charsets.js

tgriesser commented 10 years ago

We really won't want to spew out random to people's stderr/stdout pipes.

Fair

The "feature" is that people had no idea what the valid values were, so it is very dangerous to specify an unknown value, as it would do something you didn't intend silently. Since throwing is a "feature" it was introduced in 2.2, which is according to semver conventions.

I guess feature depends a lot on one's interpretation. I see it as a breaking change

Remember that "charset" is what the protocol calls it, but what is negotiated is the collation

Right, so if really what you're requesting there is the collation name, not the charset name, then it's the API that's unclear/incorrect.

Would it be possible to add a lookup table between the actual character set names and the collation names, so as to allow a more flexible charset api? Or to instead change charset to collation in the config, as that's really what you're asking for/providing.

tgriesser commented 10 years ago

If so, I'd gladly work on it and open a pull request

tgriesser commented 10 years ago

So also, looking at http://dev.mysql.com/doc/internals/en/charsets.html it really looks like the charset value should correspond to things like utf8 latin1, it's just always been that the collation was mislabeled as charset in node-mysql.

So anyway, still think this should be addressed differently than throwing for those putting a valid charset value in a config slot that asks for a charset, not a collation.

dougwilson commented 10 years ago

I guess feature depends a lot on one's interpretation. I see it as a breaking change

That's fine. It won't be going away, though. 2.2+ will throw on unknown charset settings. Commit 09e3343a508602611c14ea152fa9236c285b6be3 is also a bug fix, but will suddenly make stuff error when before it silently did nothing. Adding a throw to fix a bug is a standard thing. Node.js core has added throws when the type to an API in patch releases many times.

An API that, given an unknown value, changes that value to a default value is very clearly broken. Like I said, this is not going to change. You're welcome to spec >= 2 < 2.2 if you like :)

Would it be possible to add a lookup table between the actual character set names and the collation names, so as to allow a more flexible charset api?

Not particularly. If you give "utf8" how would it choose between the following values?

UTF8_GENERAL_CI UTF8_BIN UTF8_UNICODE_CI UTF8_ICELANDIC_CI UTF8_LATVIAN_CI UTF8_ROMANIAN_CI UTF8_SLOVENIAN_CI UTF8_POLISH_CI UTF8_ESTONIAN_CI UTF8_SPANISH_CI UTF8_SWEDISH_CI UTF8_TURKISH_CI UTF8_CZECH_CI UTF8_DANISH_CI UTF8_LITHUANIAN_CI UTF8_SLOVAK_CI UTF8_SPANISH2_CI UTF8_ROMAN_CI UTF8_PERSIAN_CI UTF8_ESPERANTO_CI UTF8_HUNGARIAN_CI UTF8_SINHALA_CI UTF8_GERMAN2_CI UTF8_CROATIAN_MYSQL561_CI UTF8_UNICODE_520_CI UTF8_VIETNAMESE_CI UTF8_GENERAL_MYSQL500_CI UTF8_GENERAL50_CI

If so, I'd gladly work on it and open a pull request

See above.

So anyway, still think this should be addressed differently than throwing for those putting a valid charset value in a config slot that asks for a charset, not a collation.

Yes, the naming is confusing, I agree, but http://dev.mysql.com/doc/internals/en/connection-phase-packets.html#packet-Protocol::Handshake calls it "charset" in the protocol, rather than collation for whatever reason. I don't want to have both a charset and a collation configuration value as aliases, but I have slated to renaming the config to collation in v3.

TL;DR I'm glad this fix has brought to your attention the invalid protocol charset option you were supplying.

dougwilson commented 10 years ago

@tgriesser feel free to email me (in my GitHub profile) if you are interested in discussing a solution over the phone.

tgriesser commented 10 years ago

Following up with an email.