Closed trondhindenes closed 6 years ago
Thanks for opening an issue. To make sure I understand, what would the equivalent call be in boto3 to either PutItem or UpdateItem?
I remember DynamoDB didn't support empty strings for a while, so I'd even less expect a typed {"S": null}
to be supported:
In that last link, there's this comment on the String type:
Strings are Unicode with UTF-8 binary encoding. The length of a string must be greater than zero and is constrained by the maximum DynamoDB item size limit of 400 KB.
We solve it using a custom NullableString type:
class NullableString(String):
def __init__(self):
super().__init__()
def dynamo_dump(self, value, *, context, **kwargs):
if value is None:
return None
elif value == '':
return None
return value
def dynamo_load(self, value, *, context, **kwargs):
if value is None:
return None
elif value == '':
return None
value = super().dynamo_load(value, context=context, **kwargs)
return value
This seems to work, but I have to admit I haven't really read up on the underlying DynamoDB reference, and we haven't done extensive testing of the above code yet either.
That's pretty good, and it works as long as you are comfortable treating empty string ""
and lack of value None
as the same. In some cases, this isn't a safe assumption (eg. migrating from a nullable column to non-null user-provided field that may be an empty string).
Bloop handles the distinction by treating anything that's set to None
as an explicit instruction to clear that column in DynamoDB during the UpdateItem call (engine.save
) and other types, like Set
, use this to coerce an empty set to None
when saving the item. There's a brief description of this here but that section could stand to be expanded.
Right now I believe Bloop will fail if you try to save an empty string (throwing a boto3 or botocore exception) because the String type won't automatically translate ""
to None
. If it did that, Bloop would do what you're expecting and clear the column when the string value is empty locally.
I'd recommend the following definition for NullableString
:
class NullableString(bloop.String):
def dynamo_dump(self, value, *, context, **kwargs):
if value:
return value
return None
This should be enough since empty strings are Falsey, and dynamo_load will take care of the rest automatically (empty strings will never be returned from DynamoDB, so the elif
branch of your dynamo_load
function won't be hit).
good point, I'll adjust my code to match.
Quick update: I've got a few days of time and decided to sort through the issue backlog. It's been long enough since I made the decision to preserve the distinction between Null
and ""
that I thought it worth revisiting. tl;dr I'm going to make the code sample above the default implementation for String in 2.2, and deal with the fallout in 3.0 should DynamoDB ever support empty strings.
A few points surfaced that I didn't bring up in the last comment from May:
None
differently from empty values for all types. Since 1.0 however, []
, {}
, and set()
are all transparently treated as None
for the purposes of existence checks. That leaves String as the only non-conforming type.Given the probability that empty strings will be supported in the future, I'm going to incorporate the code sample above into Bloop directly, so that an empty string won't try to be saved to DynamoDB (which would fail) and will instead emit the correct "delete" instruction during UpdateItem. At the same time, any column with the String type that isn't present will load in as ""
in the same way that a Set
type that has no value loads in as set()
.
Applying the criteria from the bug fix appendix this change can be made in a minor version, since (1) a user has noticed the unintended behavior (empty string currently throws exceptions from botocore) and (2) fixing the behavior in bloop by no longer passing the empty strings forward will not break a user's reasonable workaround since the only possible workarounds would cause the exceptions to stop (which this fix would also accomplish).
If the comment here comes to pass and DynamoDB does start supporting empty string, I expect enough cascading changes between the implementation and the SDKs which now have to reflect those changes that Bloop can either handle the distinction in a flag, or migrate in 3.0 with a deprecation notice. Reverting the change won't pass criteria (2) for the bug fix appendix, which is why it would have to come with 3.0 and not 2.x.
Reproduction of https://news.ycombinator.com/item?id=13173927:
I used to work on the DynamoDB team. Throwaway account because my normal account can be tied back to my real name.
Each hash key resolves to a number of possible servers the data can be on. Data is replicated across several of these servers. For redundancy. The hash key determines which shard to use.
On individual machines, each set of data is stored by a compound key of hash key and sort key (if there is a sort key). The data is probably stored on disk sequentially by sort key or close to it.
This is pretty much exactly correct. The hash key maps to a quorum group of 3 servers (the key is hashed, with each quorum group owning a range of the hash space). One of those 3 is the master and coordinates writes as well as answering strongly consistent queries; eventually consistent queries can be answered by any of the 3 replicas.
They possibly use something like LevelDB for this.[1]
Sigh...if only. I don't remember the exact timeline but LevelDB either didn't exist when we started development or wasn't stable enough to be on our radar.
DynamoDB is this very elegant system of highly-available replication, millisecond latencies, Paxos distributed state machines, etc. Then at the very bottom of it all there's a big pile of MySQL instances. Plus some ugly JNI/C++ code allowing the Java parts to come in through a side door of the MySQL interface, bypassing most of the query optimizer (since none of the queries are complex) and hitting InnoDB almost directly.
There was a push to implement WiredTiger as an alternative storage engine, and migrate data transparently over time as it proved to be more performant. However, 10gen bought WiredTiger and their incentive to work with us vanished, as MongoDB was and is one of Dynamo's biggest competitors.
Hey @trondhindenes,
With the release of 2.2 (#123) I believe this can be closed.
String and Binary will use empty values for None, so you should be able to remove the NullableString
type above and everything will just work. Here are the new and updated docs:
If you think anything's missing or could be improved feel free to comment or re-open. Thanks for your feedback!
Thank you again for your detailed responses and continous updates @numberoverzero - I'm learning lots by reading your replies :-). The docs look great, I'm fine with closing this. Thank you again for this!
As far as I understand, supporting
None
(empty) strings currently requires creating a custom type class. It would be awesome if bloop supported this natively. Feels to me that nulling a string is such a common thing that while Bloop is a super-flexible library it would be awesome to have it builtin.I'm afraid I don't understand enough of the type system in bloop to be able to properly define a nullable type.