ravendb / ravendb-php-client

MIT License
6 stars 4 forks source link

DeleteByQueryOperation does not clear unitOfWork entities #58

Closed Geolim4 closed 6 months ago

Geolim4 commented 6 months ago

Hello,

I have this operation:

    /**
     * @param string[] $keys
     * @return bool
     */
    protected function driverDeleteMultiple(array $keys): bool
    {
        $this->documentStorage->operations()->send(
            new DeleteByQueryOperation(
                new IndexQuery(
                    sprintf(
                        "from %s where id() IN ('%s')",
                        $this->getConfig()->getCollectionName(),
                        implode("', '", array_map(fn (string $key) => $this->documentPrefix . $key, $keys))
                    )
                )
            )
        );

        return true;
    }

Unfortunatelly UnitOfWork is not synchronized and require me to add this code after the DeleteByQueryOperation:

        foreach ($keys as $key) {
            // $this->instance is a DocumentSession object
            $this->instance->documentsById->remove($this->documentPrefix . $key);
        }

So, when you're calling \RavenDB\Documents\Session\DocumentSession::load this method keeps returning you entities that does no longer exist in Ravendb.

This issue is related to an implementation in progress of Ravendb in Phpfastcache: https://github.com/PHPSocialNetwork/ravendb-extension/blob/master/lib/Phpfastcache/Extensions/Drivers/Ravendb/Driver.php

ayende commented 6 months ago

That is by design, and in general, you shouldn't be accessing the members of the session directly, only through the API.

What is going on? The session implements a unit of work, usually for the scope of a single request. The query, on the other hand, is being handled by the server, and the session has no actual idea what you are doing there.

For example, you may do things like from Users where Age < 32 to delete. No way for us to correlate the values.

That said, for this specific scenario, you are far better off with just calling to $session->delete($id) on all of those. That would:

If you already know the ids, it is easiest to skip that.

Geolim4 commented 6 months ago

That said, for this specific scenario, you are far better off with just calling to $session->delete($id) on all of those. That would:

I know, but when you have multiple documents to delete, its way faster to use a DeleteByQueryOperation that iterating on key to $session->delete($id). That's why I choose the DeleteByQueryOperation when there's at least 2 or more keys to delete :)

ayende commented 6 months ago

Can you explain why? You are expected to call:

for (var id in ids) {
   $session->delete(id)
}
$session->saveChanges()

That would result in a single server call. Are you calling saveChanges per id?

Geolim4 commented 6 months ago

Are you calling saveChanges per id?

No, I tested over a deletion of 1000 items in my tests and DeleteByQueryOperation was slightly faster than $session->delete(id) iterated 1000 times followed by a single $session->saveChanges() call.

But it's micro-optimization, I'm trying to provide as much optimized implementation as possible since Phpfastcache can be used in big project with a lot of objects.

However if you tell me its by design, its fine, I'll handle the UnitOfWork behavior on my side if it allows me to grab some ms of performance improvements. :)

ayende commented 6 months ago

I'm fairly certain that the issue is that you are just starting the process of deletion, not actually waiting for it. This code does this with a wait. Note the call to waitForCompletion

  $indexQuery = new IndexQuery();
  $indexQuery->setQuery("from users where age == 5");
  $operation = new DeleteByQueryOperation($indexQuery);
  $asyncOp = $store->operations()->sendAsync($operation);
  $asyncOp->waitForCompletion();

So your original code just starts the async process and doesn't wait for it to complete.

As an aside, what are the timing differences that you are observing?

Geolim4 commented 6 months ago

I'm fairly certain that the issue is that you are just starting the process of deletion, not actually waiting for it.

To be honest, I don't need to wait for it. Deletion can be asynchronous, I'm absolutely fine with it.

So your original code just starts the async process and doesn't wait for it to complete.

That's indeed the desired behavior. I don't really need for a guaranteed results when more than one object is deleted.

As an aside, what are the timing differences that you are observing?

Over 1000 calls, it's pretty variable but I noticed from 0.050s to 0.80s. This may seems trivial, but from what I know, Phpfastcache is used in really huge projects that can make over multiple thousands of calls per request, so if I can optimistically make the least optimization, I do it without hesitation :P

ayende commented 6 months ago

The problem is that you are looking at a single call. Not total system load. When you issue a DeleteByQuery, we need to execute the query, then start deleting the results. With delete, we can just delete the results.

I would recommend that you go with the delete instead. Since you are not comparing apples to apples and in the case of many such calls, there is a server side load to consider.

Geolim4 commented 6 months ago

From your POV DeleteByQuery is using internally more CPU/ressources than delete ?

Because from my strictly neutral POV, DeleteByQuery performs better using a IN( ... ) "delete request" than delete called 1000 times followed by a $session->saveChanges() call. That's where I started to encounter UnitOfWorks synchronization.

ayende commented 6 months ago

You are comparing:

Note that both aren't really expensive, but given that you mentioned that you have a high potential load and care about micro optimizations, it matters

In particular, a single one doesn't matter, but if you have a 100 going at once, that can be a waste of resources.

Geolim4 commented 6 months ago

So what's your advice to make the single and multiple deletion as much optimized as possible ?

(No matter the complexity of the implementation).

ayende commented 6 months ago

$session->delete() in all cases

Geolim4 commented 6 months ago

Okay then. Thanks.

Geolim4 commented 6 months ago

I finished my implementation, see "Run Tests": https://github.com/PHPSocialNetwork/ravendb-extension/actions/runs/7490306797/job/20388839683

It's working smoothly and I should be able to release a stable version by the end of the week, thanks :)