serverless-seoul / dynamorm

AWS DynamoDB ORM in Typescript
Apache License 2.0
55 stars 4 forks source link

Support for Set data types #12

Open dankochetov opened 4 years ago

dankochetov commented 4 years ago

Are you planning to support Set data types (SS, NS, BS)? If there is no current progress in that direction, I'm willing to work on this.

breath103 commented 4 years ago

Hi, so this contains 2 different requests i think -

  1. conversion between NN <-> Set, SS <-> Set, BS <-> Set
  2. supporting DynamoDB operations like ["ADD"] to Set.

so for 1, we haven't talked about it yet but we think it completely makes sense - actual implementation will be something like

@attribute()
public numberSet: NumberSet;

(because of reflect-metadata limit we have no choice to use NumberSet like custom class instead of clean Set)

for 2, we don't really have plan as of now, since that consumes same write capacity anyway - though it does have some beneficial usage.

for 1, I'd love to get PR and review - @dankochetov thanks.

benhutchins commented 3 years ago

@breath103 Within Dyngoose, we dropped the use of reflect-metadata in favor of more explicit specifications, and there you can use sets and maps. Although it doesn't optimize updates to perform ADD operations. @dankochetov if you're still looking for this support, you can see how Dyngoose implemented it if you want to consider porting it back to Dynamorm.

breath103 commented 3 years ago

I'm not sure you understand what i meant by limit of reflect-metadata this is issue of Typescript metadata system itself. on reflect-metadata there is no concept of template (generic).

so NN should be typed as Set<number> and SS as Set<string> but there is no way to pass this information to Dynamorm as metadata. you can check this simply by checking compiled js file.

@Attribute("score")
public score: Set<number>;

compiled as

__decorate([
    dynamorm_1.Decorator.Attribute({ name: "score" }),
    __metadata("design:type", Set)
], Model.prototype, "score", void 0);

in short, we just need to create

class NumberSet extends Set<number> {}
class StringSet extends Set<number> {}

and handle those on ORM query layer. this is completely possible, just need work;

benhutchins commented 3 years ago

@breath103 You could also change this by creating more explicit decorators. You could maintain the existing support you have for @Attribute(…) but you could similarly create:

@Attribute.NumberSet()
public score: number[]

This would then allow you to avoid using reflect-metadata and you'd know what the value format should be.

That is how Dyngoose implemented this, while it maintains @Attribute({…}) support it isn't mentioned anywhere in the examples or docs because every attribute type has explicit options that can impact how the data is formatted and dealt with.

An example string.metadata.ts

@Dyngoose.Attribute.String({ trim: true })
public automaticallyTrimmedString: string

@Dyngoose.Attribute.Date({ nowOnCreate: true })
createdAt: Date

This could be added to Dynamorm without losing the reflect-metadata usage. See more examples of how it can work on Dyngoose Attributes doc.

breath103 commented 3 years ago

oh you mean not using reflect-metadata at all. i mean that's technically much easier way - it's just trading off developer's convenience of manually mapping types - this was design decision on system. i'm not sure why would you want to use SS or NS if it doesn't support ADD / REMOVE and not mapped as "Set" javascript class? that's up to original questioner. anyway idea of implementing this feature on dynamorm is completely valid. we just haven't invested time on that

benhutchins commented 3 years ago

@breath103 With sets, you can query using the contains query operator to determine if a set or list contains the specified value. That querying support can be extremely useful.

I actually only see the the ADD/REMOVE utilities as useful if the set if massive. If it was a massive set, then you'd want to avoid loading the set most of the time and you might not know what the list contains unless you perform a query using the contains operator. If the set is not so large you need to avoid loading it, then it likely isn't too large to send as a complete value when calling UpdateItem.

Except, Dynamorm currently does not use UpdateItem, it always uses PutItem which replaces the entire item rather than saving only the updated values. See Table.save which just calls Writer.put.

If you're going to attempt to optimize performance of sets, it might first be worth optimizing every single update operation of Dynamorm. Dyngoose does this by determining the best operation given the state and changes of the instance, see Table.forceSave.

Although, the limitations in Dynamorm is why I ended up forking dynamo-types into Dyngoose. I've ended up adding a lot of functionality to Dyngoose that did not exist in dynamo-types. I'd eagerly merge the two projects back together if possible.

benhutchins commented 3 years ago

Just another thought. I think developers would find it more convenient to change the decorator than to utilize a special set class. But a special set class for NumberSet could have some utilities on it for .add and .remove, however, if it was a native JavaScript array it might be easier to consume and manipulate; but then you could perform a diff during the save to determine which items are removed and which are added to build your ADD and REMOVE operations.

breath103 commented 3 years ago

Well those are are very often talked issue:

  1. why not partial update? a. DynamoDB from it's design consume exactly same amount write capacity regardless it's updateItem or putItem b. implementing dirty check and updating delta only is actually very complicated. that is one of the reason why proper orm like https://github.com/typeorm/typeorm just become very massive.

so in summary, it doesn't really have material performance (or dynamodb capacity) impact unless record is crazy big. but size of single record of DynamoDB has limit (400kb), and updating that record (either partial / full) triggers 100 write capacity - which is reason why i think that's very anti pattern regardless of ORM implementation. so it's very risky feature to create and most of time doesn't have difference anyway - which is reason why we design this as current

  1. You can still use partial update for certain use cases https://github.com/balmbees/dynamo-types/issues/30 there are some valid use case such as conditional update. which is why we implemented this.

  2. Contains query again, this doesn't matter when you do Query or Scan. since unless it's HashKey or SortKey DynamoDB consume exactly same read capacity anyway. only case this actually would have meaningful impact is on Conditional update as i mentioned above. (unless record is very huge, which is again, should be anti pattern anyway)

  3. Developer convenience

    @Decorator.Attribute("set")
    public setAttribute: NumberSet;
@Decorator.NumberSetAttribute("set")
public setAttribute: number[];

// or
@Decorator.NumberSetAttribute("set")
public setAttribute: Set<number>;

I don't think second one is better not only because of Decorator dependent definition, but also giving out implicit option for Array and Set. also, this is design decision. you can create it on Dynagoose if you want

benhutchins commented 3 years ago
  1. TypeORM supports many databases, so it was bound to become massive. Unfortunately though, write capacity units isn't really the reason why UpdateItem is beneficial. Besides the better performance you get from reduced data transmission, performing a PutItem replaces the entire document every time you save. When using projections (which Dynmorm doesn't really allow for) or indexes that are not ProjectionType of ALL (such as INCLUDE which only contains specific data), Dynamorm would wipes out the unloaded data and loses attributes of documents.

When I was evaluating dynamo-types, that limitation was probably the single largest reason I couldn't convert to using dynamo-types. We have some massive tables and cloning the entire table into an index would be wasteful, by loading data from that index with dynamo-types and then saving would corrupt our tables.

But all of dynamorm is designed around the concept that GSIs will have ProjectionType of ALL and it provides no utility to specify the ProjectionExpression when loading records, so it assumes it will always load the entire record. So when saving, in that case, it would be fine to use PutItem if you had the entire record.

Unfortunately, at scale, there are too many times when loading the entire record is wasteful. The ability to perform a partial update to documents, without losing attributes that were not loaded, is necessary in those situations.

  1. Conditional saving is not the same as UpdateItem, both of which are important, and both of which I implemented in Dyngoose. I was involved in that PR over at dynamo-types and pointed out then that Dyngoose had already implemented conditional saving, although with a very different approach to typing the conditions.

  2. But again, it's not just about the DynamoDB capacity utilized, it's about how much data you are loading. If an application needs to query for records that match specific filters, they only want to retrieve those records for processing. In that case, they find it acceptable to use the read capacity units from DynamoDB as long as it doesn't wastefully load records they will simply ignore from their application. You can still avoid wasted resources by asking DynamoDB to do the filtering and using the contains operator is there for that purpose.

  3. No problem. At this point I think merging the two projects wouldn't likely be feasible. There were a lot of things that lead to the need to create Dyngoose. Still, feel free to pull in some of the improvements if you find them helpful and can port them. Some of the features so far include development database seeding, CloudFormation template resources creation, the ability to add filters to queries, powerful call-chain querying, typed creation of new instances, support for transactions and batch operations across multiple tables, a native JavaScript array output from queries, support for Dates, support for JSON objects, support for Binary attributes, support for BigInts, support for String, Number, and Binary Sets, support for table migration, support for default attribute values, support for encrypted tables, support for pay per request and for provisioned throughput tables, support for dynamodb streams… Hopefully some of the work can be helpful to dynamorm in the future.

breath103 commented 3 years ago

@benhutchins

Yes indeed this assumes ProjectionType=All, since this is "ORM". technically in order to make sense the consistency of ORM, it's required to create "separated" model if the data schema is different. otherwise developers are always exposed to use none-existing attribute, since there is no explicit and safe way of knowing which attributes exists or not.

Again this is simply system design trade offs. I mean i'm literally AWS certified Serverless Hero. me and team didn't implement those cause we decided that it's better tradeoff to not have those. both on productivity and performance wise.

I'm not sure about your intention here precisely because of that. if you don't agree with the view of this repo and maintainers, you can just fork this repository and create your own. not to even mention since your point of view on designing ORM is completely opposite from us thus i've never had intention to merge with Dynagoose anyway.

Instead you just copy-and-pasted a lot of codes from this repository, which is against ISC license. I'll let you slide with that, but i'll not tolerate anymore pointless discussion here. if you want to just advertise your repository without having productive suggestion to repository, keep that to yourself. i'm blocking you from here.