realm / realm-object-server

Tracking of issues related to the Realm Object Server and other general issues not related to the specific SDK's
https://realm.io
293 stars 42 forks source link

How to avoid duplicate items #340

Closed ivnsch closed 6 years ago

ivnsch commented 6 years ago

Goals

Add items with a semantic unique (edit: i.e. not the primary key) in 2 different devices at the same time.

Expected Results

Only 1 of them is added - the other is dismissed ( /deleted) as a duplicate.

Actual Results

I get duplicates.

Description

This is rather a general question - what can I do to avoid or fix this?

The uniques are composed by fields, some of which are in nested objects.

Since Realm doesn't seem to support this kind of uniques, the unique isn't recognized and the remote database just adds 2x the item.

I tried implementing a consistency check in the notification block, so when it detects a duplicate it removes it. The problem with this was that it ended removing both items, as each device deleted the item of the other device!

In my particular case, for diverse reasons, this is not a rare case, so I have to handle it somehow. In more detail: Users can swipe items from a todo list to a done list and it can happen that they swipe an item at the same time. My current logic (which I can't change at this point) checks (locally) whether there's an item already with the same semantic unique in the done list, if yes increments it, otherwise creates a new item. So when multiple users swipe the element at the same time and it's not yet in the done each will create a new one locally and upload this to the ROS.

bigfish24 commented 6 years ago

If your objects can have primary keys then you can utilize Realm's merge logic to achieve the goal.

Another neat feature of Realm is that you can create objects on two devices with the same primary key and they will ultimately merge once synced into just having one object.

The only thing to keep in mind is how the merge of these two objects happen. Take for example a Settings object that stores the user's preferences which you want to sync across devices. If one devices creates it and then adjust a setting, while at the same time another device creates the object but doesn't yet change anything, you would want the sync to occur such that the object that has user driven changes "wins". The merge algorithm supports this if you make sure to create the objects with default values and then apply changes to it if the user does so. The default values will be overridden by other changes even if those changes occurred earlier.

For Swift this can be a little tricky because you declare default values in the schema in such a way we can't know that they are default:

@objc dynamic var name = "" // default empty string

As a result, with Swift you need to specifically use the realm.create API:

You can also partially update objects with primary keys by passing just a subset of the values you wish to update, along with the primary key:

// Assuming a "Book" with a primary key of `1` already exists.
try! realm.write {
realm.create(Book.self, value: ["id": 1, "price": 9000.0], update: true)
// the book's `title` property will remain unchanged.
}

Any property you leave out from the dictionary you pass in will then use the default value and we know it is default, thus the merge semantics are as described above.

Hope this helps!

ivnsch commented 6 years ago

Are those primary keys you refer to usable in my case? I wrote that the semantic uniques I use are deeply nested in the fields of the objects. Additionally (I didn't wrote this, probably I should have) I do have primary keys, which are uuids - but these are simply assigned when the objects are created and of course don't help for my use case (since the new item in each device will have a different uuid). The semantic uniques is what I use ("manually") to ensure my records are, well, unique. But Realm doesn't know them. So far I know in Realm it's not possible to have uniques additionally to the primary key (like it's the case e.g. in an SQL database) is that still correct? If yes how would I approach this?

bigfish24 commented 6 years ago

Why can’t your PK be a string that is a concatenation of the fields that define the items uniqueness? We don’t have true composite key support but this might be a way to mimic?

ivnsch commented 6 years ago

I don't remember right now why I added an UUID to this object, I think it was to be easier to do certain queries and to be consistent with the rest, which also use uuids. If I remember something else I'll add it here.

In any case now it's a bit difficult to change this as I've a lot of queries that involve the uuids, and would have to change everything to this unique...

Is it the only way to solve this problem?

bigfish24 commented 6 years ago

Right now Realm’s merge algorithm can deliver the behavior you want if you use primary keys like I described. If not then you will have to check for duplicates and remove them manually.

ivnsch commented 6 years ago

@bigfish24 Ok, thanks. The problem with removing the duplicates manually is that the devices will delete the item from each other, so at the end, there are no items.

I was changing my items now to use compound keys as described in https://github.com/realm/realm-cocoa/issues/1192 - is there still no way other than storing the compound key as an additional field and having to update it manually on each field update? This is very cumbersome...

At least in my current case it clearly is, I have several fields in deeply nested objects as part of the compound key, so each time I update any of those nested objects (in a transaction not related at all with the object where I have the compound key), I have to do a query on the objects with the compound key to update it. I don't want to have to add code all over the app to keep primary keys up to date. This is ugly and obviously very error prone. The database should do this.

bigfish24 commented 6 years ago

@i-schuetz yeah sorry we don't have better support for compound keys right now. We definitely want to improve this, but I will be upfront and say it hasn't been very high on the priority list. Thanks for bearing with us for now.

ivnsch commented 6 years ago

@bigfish24 okay, well... please improve this soon, given that uniques are essential for the real time sync - a core feature - to work properly, it seems crucial to support them.

ivnsch commented 6 years ago

Ok, I solved one case by adding a manually maintained compound unique, as suggested.

I have now another case - a Realm.List. When adding items in multiple devices, at the same time, to one of these lists, there will be duplicates again, and in this case the uniques don't help.

I'm trying now the workaround used in RealmTasks here: https://github.com/realm/realm-tasks/blob/master/RealmTasks%20Android/app/src/main/java/io/realm/realmtasks/TaskListActivity.java#L94 of manually removing duplicates, however one of the devices crashes because it tries to remove a row which is apparently not there anymore (e.g. "attempt to delete row 4 from section 0 which only contains 4 rows before the update") - any ideas?

Edit: It crashes sometimes on both devices at the same time as well.

Im more detail:

Edit 2: This particular problem was solved. I was doing a reloadData() on the senders after removing the duplicate, so when receiving the removal from the other device it was already removed and this caused the crash. However, getting now duplicates somewhere else. Will investigate and post again if necessary.

Edit 3: Ok, solved the other duplication... also needed to remove something manually.

TLDR: Workarounds seem to solve the problems so far, in my particular case (though haven't done proper QA yet). In any case, please release mechanisms soon to solve these problems properly... i.e. proper compound uniques and sorted sets. The workarounds are very finicky, error prone and time consuming. It's disconcerting that a sync platform leaves unique guarantees to the end users... this should be in the core of the platform from day one...

trant commented 6 years ago

It is very sad to have to use workarounds to solve the compound primary key problems. My database just doubled in size just for adding that extra field. I have duplicated data there because of that.

siteId userId compoundKey
1 2 1-2
2 2 2-2
3 2 3-2
4 2 4-2
astigsen commented 6 years ago

Closing this for now as support for compound primary keys is tracked in this issue: https://github.com/realm/realm-core/issues/1370