Closed jquast closed 9 years ago
Hmm, interesting.
I don't remember why/where the keys become utf8. But it certainly makes sense if what you get out of the dictionary is what you put in in the first place (unicode => unicode, rather than unicode => str, as you say).
I'd be a little careful with changes that break backward compatibility for str
keys, but if you have a way of fixing the unicode keys only, that would be perfect. Does that make sense?
Makes sense. You're right we need to be very careful for compatibility. This is actually quite complicated in such respects. You're looking for:
Thinking some more, downstream, I'm always explicitly decoding as utf-8, and such code would fail after the change: https://github.com/jquast/x84/blob/140f480b956cdadc742801e3ff57f23a34cf8816/x84/bbs/msgbase.py#L74 https://github.com/jquast/x84/blob/b1c2ef927af95214db3081fe8d3cebb71958e1a7/x84/default/tetris.py#L58 https://github.com/jquast/x84/blob/34cd65c1de666bcf5e9e50501d83d87367915ac6/x84/default/lc.py#L47 https://github.com/jquast/x84/blob/3ee80262a33a9ef2fb497c210c969574b6cb9223/x84/bbs/userbase.py#L16
With a version
spec, existing databases would continue to succeed with such downstream workarounds (item 3), but if i were to instantiate new databases, and their version
was implicitly set to 2, then it would fail ...
I've been in these situations a few times before, open source api libraries are the hardest... here are some suggestions.
A. provide an entirely new package
I think this is the only way not to cause anybody trouble, and new users of the library get the "unicode in, unicode out" behavior without having to research using any toggles or parameters to make it so.
This is nice, because the code change is minimal, it can simply naturally store by type, and return the same type. Easy to test and no magic about backwards compatibility.
B. do not break compatibility
do_unicode_keys=True
to return as a unicode. utf8-encoded bytestrings are always returned for unicode types by default for backwards compatibility with version X.Y."do_unicode_keys=False
by default argument for SqliteDict.C. break compatibility
I won't go into the details much more, because this one is probably the least likely. But basically document the change in Changelog, provide example error messages caused by the change which users might run into to help search engines find the cause, and the workaround (set do_unicode_keys=False
or some such).
I'm -1 on A, +1 on B or C.
And if C: just mention the change in Changelog, bump up the version to 2.0.0 and call it a day (no new parameters). So that's basically A, but discerned by package version rather than new package name.
In the end, the choice between B and C is yours Jeff, since you're the one who found the issue. Whatever suits you best!
Storing the type means sqlitedict
will suddenly need more memory for the same amount of keys, right?
Closing -- subsumed by #25.
I know sqlite is utf-8 internally, that's fine.
But I end up with boiler plate code that says:
This seems wrong. Shouldn't sqlitedict do this for us?
If you agree, that the .keys() and .items() and so on should automatically decode str-type as utf-8 into unicode-type, then I will be happy to submit a PR.