realm / realm-swift

Realm is a mobile database: a replacement for Core Data & SQLite
https://realm.io
Apache License 2.0
16.28k stars 2.14k forks source link

Support "Cascading/Inheritable" Writes #8650

Closed bdkjones closed 2 months ago

bdkjones commented 2 months ago

Problem

Consider a tree of Object subclasses:

Client
    Project
        Job
            Version

Each object is complex, with many properties and relationships. The user can delete any object at any time and all descendent objects should be deleted as well.

Because each object's deletion flow must be carefully customized to cleanup all side-effects, we would logically have methods that specialize in each operation:

func deleteClient(_: Client) {...}
func deleteProject(_: Project) {...}
func deleteJob(_: Job) {...}

When we delete, say, a Client, we also need to delete all the Project, Job, and Version objects underneath that client. The problem is that we can't simply re-use these methods because we can't open a write transaction while we're inside another write transaction—we can't "inherit" the open write transaction.

Solution

I'd like the ability to piggyback on an open write transaction, if there is one, and create a new transaction otherwise. Something like this:

@discardableResult
    public func cascadingWrite<Result>(withoutNotifying tokens: [NotificationToken] = [], _ block: (() throws -> Result)) throws -> Result
    {
        var didOpenWrite: Bool = false
        if !isInWriteTransaction {
            beginWrite()
            didOpenWrite = true
        }

        let ret: Result
        do {
            ret = try block()
        } catch let error {
            if isInWriteTransaction && didOpenWrite { cancelWrite() }
            throw error
        }

        if isInWriteTransaction && didOpenWrite { try commitWrite(withoutNotifying: tokens) }
        return ret
    }

This API would grant a lot of flexibility for situations where cascades are necessary.

Alternatives

There are two alternatives:

1) Repeat the deletion logic for each lower-level object in the deletion function for the higher-level objects. Disaster waiting to happen.

2) Manually re-implement what I put above, either with an extension on Realm or by sticking the same code around each write in the deletion functions for everything but top-level objects.

How important is this improvement for you?

I would like to have it but have a workaround

Feature would mainly be used with

Local Database only

sync-by-unito[bot] commented 2 months ago

➤ PM Bot commented:

Jira ticket: RCOCOA-2409

bdkjones commented 2 months ago

Alternately, if there's something that's going to blow up about my approach above, I'd love to know that as well. There's a little extra overhead involved in capturing all the blocks, and you could certainly end up with a write transaction that has 700 billion operations in it if you abused this, but that's no different than abusing any other API.

Jaycyn commented 2 months ago

I am probably misunderstanding the intent here, so a question: this seems more relationship than subclass to me. Is the setup "Consider a tree of Object subclasses:" where each object is an actual subclass of the object above it?

class Client: Object {
... some client properties
}

class Project: Client {
... some project properties
}

class Job: Project {
... some job properties
}

class Version: Job {
... some version properties
}

so Version inherits properties from Job and Project and Client and Object?

bdkjones commented 2 months ago

@Jaycyn No. Each of the tree levels is a direct subclass of Object.

Jaycyn commented 2 months ago

Ah! Make way more sense then. Thanks for the clarification on that part.

When we delete, say, a Client, we also need to delete all the Project, Job, and Version objects underneath that client.

What does 'underneath' mean in this use case - as in related objects? For example, is this how the client model is set up?

class Client: Object {
   @Persisted var relatedProject: Project!
}

If so, when the client object is deleted, you want to cascade delete the relatedProject object as well. Correct?

bdkjones commented 2 months ago

@jaycyn that's correct. The client owns projects, which own jobs, which own versions.

Any one level can be deleted directly and, when that happens, all lower levels need to be deleted as well. But because each level is complex, with many side-effects to clean up (e.g. jobs hold other objects that must be deleted), I have a dedicated function to delete each type.

I want to re-use those functions when I delete higher-level objects. For example, when deleting a Client I would simply invoke my deleteProject function, which in turn would invoke deleteJob, and so forth. This lets me re-use the deletion code for each level.

When the deletion functions are called as part of the "cascade", they will use the existing open write transaction. When I call the deletion function directly and there is no open write transaction, they'll open a new one.

Jaycyn commented 2 months ago

We've explored the cascade delete process a number of times but I don't think that's a single feature that could be added that would work across the board.

In one of our larger projects we have something similar where removing an object should also remove corresponding objects as one should not exist without the other. These objects are complex with dozens of properties and relationships.

For our case we simply pass the realm that's in a write 'state' to our function to clean up the parent object, before then deleting the parent object. I am sure you have worked through this but it wasn't really a lot of code so I thought I'd throw it out there. This is kinda like the alternatives you mentioned.

Here are 4 sample objects where each rely on the existence of it's "parent" object. e.g. a Version cannot exist without it's "Parent" Job etc

class Client: Object {
    @Persisted var clientName = ""
    @Persisted var relatedProject: Project!

    //this code could just be within the calling write.realm but wanted to put it here
    //to have the a parent object responsible for cascade deleting it's "child" objects
    func cascadeDelete(fromRealm: Realm) {
        fromRealm.delete(relatedProject.relatedJob.relatedVersion)
        fromRealm.delete(relatedProject.relatedJob)
        fromRealm.delete(relatedProject)
    }
}

class Project: Object {
    @Persisted var projectNumber = ""
    @Persisted var relatedJob: Job!
}

class Job: Object {
    @Persisted var jobLocation = ""
    @Persisted var relatedVersion: Version!
}

class Version: Object {
    @Persisted var versionInfo = ""
}

then the code to delete the client (and in turn delete all of the objects relying on that parent object) is

let client = realm.objects(Client.self).first! //get the client
try! realm.write {
    client.cascadeDelete(fromRealm: realm)
    realm.delete(client)
}

One of the points in the question was:

we can't open a write transaction while we're inside another write transaction

but doesn't doing it like the above avoid that issue altogether?

Why code for opening and committing write transactions when the delete can be accomplished within a single write or does your use case prevent that.

On that note, per your feature request, you could just use an extension and roll all the cascade delete code into that - that makes deleting a client a one-liner.

extension Realm {
    func cascadeDelete(client: Client) {
        try! self.write {
            self.delete(client.relatedProject.relatedJob.relatedVersion)
            self.delete(client.relatedProject.relatedJob)
            self.delete(client.relatedProject)
            self.delete(client)
        }
    }
}

call with

        let client = realm.objects(Client.self).first!
        realm.cascadeDelete(client: client)
bdkjones commented 2 months ago

The point of my suggestion is to let the deleteJob() function be flexible. I can call it DIRECTLY and it'll open a new write transaction to delete a single job. Or, I can call it INDIRECTLY (from within the deleteProject() function, where I need to delete all the jobs in that project) and it will use the existing open write transaction that deleteProject() opened.

Rationale:

I suggested the API addition for two reasons:

1) to make sure there's not a footgun here that I don't see.

2) I don't like extending third-party APIs if I can avoid it; I prefer to use officially supported approaches.

tgoyne commented 2 months ago

There's two core reasons why we stubbornly require explicit write transactions rather than just letting you write at any time:

  1. Beginning a write transaction has to refresh the Realm. Doing this implicitly in the middle of your code would be surprising, and more importantly would make a lot of code subtly incorrect. A very simple example would be that running obj.foo = obj.foo + 1 on multiple threads at once could lose writes due to incrementing from stale reads, while realm.write { obj.foo = obj.foo + 1 } guarantees that the read is always of the latest value.
  2. We really want you to do batched writes. Our writes have enough per-transaction overhead that we really want to discourage putting a write inside a loop, or even just doing several writes in a row without a loop which could be combined. We want to encourage people to think about how to batch their writes from the very beginning to save them from a very bad time down the road. In addition, atomic transactions make a lot of correctness problems around seeing partially written state go away. In an app using a local Realm this is mostly a problem if your app happens to crash after committing an inconsistent state before it can then commit the final state, but when sync gets involved it gets spicy.

The approach you're taking is the sort of thing we're trying to discourage for the second reason. If you have two Jobs, it'd be perfectly natural to just write deleteJob(job1); deleteJob(job2) and perform an extra write transaction. If deleteJob() instead required you to call it in a write transaction, you'd probably write realm.write { deleteJob(job1); deleteJob(job2) } instead of realm.write { deleteJob(job1) }; realm.write { deleteJob(job2) }. We originally didn't even have isInTransaction because we had the idea that there should be a total division between the code managing transaction state and code doing actual mutations and if you structure your code that way you'll never need to be able to check it.

Ultimately though, this is all just recommendations from our perspective and not rules that you must follow or the Realm Police will arrest you. If you don't have trouble remembering that two adjacent calls to a delete function should be manually wrapped in a write block even though it'll work without it, then you should feel free to write the more elegant code, especially if having one of the deletions run but not the other isn't a meaningful correctness problem in your specific scenario and it's just some slightly suboptimal code. Our SwiftUI stuff even has a similar function, although that should not necessarily be taken as an endorsement of it being a good idea.

bdkjones commented 2 months ago

@tgoyne I may not be explaining my approach clearly. Batching operations in a single write transaction is EXACTLY what I'm doing here.

I'm trying to make the function flexible enough that it can either be called directly to delete a single object, OR it can be called during the deletion routine of a parent object and inherit that parent function's already-open write transaction.

I'm explicitly minimizing the number of write transactions. I'm trying to do it ALL in one transaction.

bdkjones commented 2 months ago

Screenshot 2024-07-22 at 14 13 02

Here's an example of where I'm using this technique. When I delete ScopedCueFlag objects from my Realm, there is a side-effect I must manage: Map is the only Realm collection that doesn't auto-clean itself, so I must go find all Map<String, ScopedCueFlag> collections that contained this flag and remove the keys that were assigned to the value.

This logic is encapsulated in deleteCueFlags().

Now, sometimes I need to just delete flags directly: the user clicked a "delete flag" UI command. Other times, (such as in the code pane on the left in my screenshot) I need to delete the flags as part of a larger operation where I'm modifying/deleting the objects that "own" the flag objects.

I don't want to repeat the deletion logic; I want to be able to call deleteFlags() and have it ask, "Is there a write transaction already open? Yes? Okay, use that transaction. No? Okay, open a write transaction."

Bottom Line:

This "cascading write" approach is working well for me and I think it ought to be an official part of the SDK:

Screenshot 2024-07-22 at 14 24 14

bdkjones commented 2 months ago

Another Way To Think About It

Right now, if you call write() while in an open write transaction, Realm throws an exception instead of just making it a no-op and piggybacking on the write transaction that's already open. That seems...weird:

"You can't possibly write right now! We have to be PREPARED to write!" "Aren't you already prepared to write from this other thing I'm writing?" "Yes." "Well...could you maybe just write this stuff, too, without pitching a fit about it?" "No."

(I'm sure there's edge cases.)

nirinchev commented 2 months ago

The reason for the behavior above is to ensure the durability aspect of ACID. Imagine the following code:

func updateName() {
    realm.write {
        person.name = "Peter"
    }

    // We've committed a write - we should be confident this is persisted
}

func updateAddress() {
    guard addressNeedsUpdate else {
        fatalError("updateAddress was called when update is not needed")
    }

    realm.write {
        person.address = "some new address"
    }
}

// somewhere
realm.write {
    updateName()
    updateAddress()
}

Since updateName is now called inside another write transaction, it will not actually do a commit at the end of the write, meaning that if you then triggered an update to an external system, you'd be in an inconsistent state since updateAddress has the potential to crash the process and revert the name change. While there are scenarios where this may be fine, we can't make that decision in a general-purpose way. If the cascadingWrite extension fits the needs of your application, that's great and you should by all means continue to use it, but we don't plan to add it as a first-party method at this point because we feel the complexity involved of making it general-purpose are too much right now. We do have this (fairly stale) ticket tracking adding support for checkpoints/subtransactions. The folks on the Core team lay out some of the considerations around this problem, but it hasn't garnered sufficient interest that would justify prioritizing it.

I'm going to close this issue as the approach here is similar to that in https://github.com/realm/realm-swift/issues/2502 and in both cases, real support for nested transactions will require a fair amount of Core work, which is tracked in https://github.com/realm/realm-core/issues/1149.

bdkjones commented 2 months ago

@nirinchev That makes sense! To confirm: is there a point at which a single write transaction becomes too large?

Obviously there's some point where we run out of RAM, etc. but is there a practical guideline for how many changes should be included in a single write before it's better to batch that write into several pieces? With my approach here, I can end up with 10,000+ changes in a single write. I'm unclear how that translates to sync changesets and whether there's a point where I might create a changeset that's too large to function performantly if I use these cascading writes.

nirinchev commented 2 months ago

We don't have some very strict guidance here, but as a rule of thumb, you probably want to avoid making changes to more than a few thousand objects in a single transaction when using sync. I'm not sure how easy it would be to do with the cascading write approach short of adding some change tracking logic to it - perhaps if you're looping over objects at the top level, you could inject code that does commitWrite+beginWrite every x iterations.

bdkjones commented 2 months ago

@nirinchev got it, thanks! Just to clarify: is there a difference between "changes" and "deletes"? The only time my app can potentially touch a huge swath of Objects in one transaction is if a user deletes a top-level model entity that has many descendent children, grandchildren, etc. (The root object in a branch of the OutlineView, essentially.)

Do deletes count as "changes" for the purposes of this advice?