neos / flow-development-collection

The unified repository containing the Flow core packages, used for Flow development.
https://flow.neos.io/
MIT License
134 stars 187 forks source link

BUGFIX: Fix count for query across OneToMany joins #3342

Closed tvt closed 2 months ago

tvt commented 2 months ago

The Query->count now returns the correct count when a criterion is added on a OneToMany relation.

Review instructions

The problem is described in details in #3331.

Checklist

kitsunet commented 2 months ago

IMHO we need to make this an option to be decided in userland code, the question is rather what should be the default.

kdambekalns commented 2 months ago

But that option would be making the query distinct, no? So, do we "just" need to document it?

Or do we "play it safe" by keepig true unless switched off somehow (to "enable performance"?)

But AFAIK we cannot decide between "distinct flag not set" and "distinct flag set to false", can we? So that would mean adding (yet) something Flow-specific.

Sebobo commented 2 months ago

I wrote in Slack back then an assumption: The problem is basically whether we join on a 1-1 or many-1 relationship which doesn’t need to be distinct, or we join on a 1-many right? Which could be a basis for an auto detection.

But having two methods like doctrine does it in https://www.doctrine-project.org/projects/doctrine-orm/en/3.0/reference/query-builder.html#the-expr-class with count and countDistinct would be the another option.

bwaidelich commented 2 months ago

My 2ct: Please lets not make it a configuration option or "solve" it via documentation. Both are just coward ways to push the burden of API design to the userland :) I think two distinct (no pun intended) methods make the most sense, and personally I would prefer the default (count()) to behave like I would expect in most cases (i.e. returning the same result as if I would count the items of the iterated resultset)

tvt commented 2 months ago

@kdambekalns sure, it reverts the incorrect code but also adds a test case, so the same bug can't be reintroduced.

I understand there's a performance consideration as well, but I would expect that correctness overrules performance.

The performance can probably be fixed without reintroducing bugs, but I'm not deep into the Flow Doctrine code, so I don't know how.

Also if you're having performance issues with count, why not write DQL or native SQL instead of introducing bugs to the framework?

Sebobo commented 2 months ago

Also if you're having performance issues with count, why not write DQL or native SQL instead of introducing bugs to the framework?

Nobody wanted to introduce a bug.

The problem is that the performance issues are also triggered by using the official APIs of Flow and Neos. Going around that and using custom queries is not an easy task and creates issues when upgrading Flow/Neos. I know that because I spent a ton of time on performance stuff in the last years. And initially you don't always know what's going on. So one can spend days debugging after a project has been running perfectly fine for a long time and at some point reached the critical mass and doesn't scale anymore

Similar issue with our caching. Many Neos projects get super slow after some weeks when integrators don't know that the filesystem caches will kill their app at some point. Now you can say: "Hey integrators, properly read the docs", or we provide defaults that scale until you do your super fancy thing, because that means, its now your job to also care about the performance of it.

I expect a framework with its APIs to handle a million db entries. Which is a laughable amount actually.

Now if we adjust all the public APIs to use native queries and make that safe, the next dev will again use the query builder and run into the same issue at some point.

Don't get me wrong, I really want to have a proper fix for this, that fulfils all requirements.

tvt commented 2 months ago

Don't get me wrong, I really want to have a proper fix for this, that fulfils all requirements.

I agree that would be the best solution, but until then I suggest that the count function returns the correct result. Why would you even need a count if whether it's correct or not doesn't matter?

[...] and creates issues when upgrading Flow/Neos.

Tell me about it. We're stuck on 7.3.19 because of this bug being introduced. It wasn't easy to track down.

kdambekalns commented 2 months ago

I think two distinct (no pun intended) methods make the most sense

Did someone already try to find out how "pure Doctrine" users usually handle this? Or is this somehow a problem that we have created ourselves by adding to many layers around Doctrine in Flow?

tvt commented 2 months ago

What about this then @kdambekalns?

bwaidelich commented 2 months ago

How shall we proceed with this one?

My gut feeling is that whatever we decide/detect is to some extend intransparent and potentially leads to errors.

@Sebobo you have valid performance concerns and those have to be addressed.

What do you folks think of adding that second count method like doctrine does? Wouldn't that solve the issue?

tvt commented 2 months ago

I just want to make sure we're all on the same page @bwaidelich. As I see it this fixes a bug without introducing any performance problems, except for situations that currently have the possibility of returning wrong counts.

If this fix causes performance issues for anybody, I think the bigger issue is that they are currently getting wrong counts.

bwaidelich commented 2 months ago

As I see it this fixes a bug without introducing any performance problems

I'm not into ORM at all (and try not to use it if possible) so I can't judge whether the fix catches all cases or might even introduce new, different, bugs (no allegation I simply can't tell out of ignorance to the topic)

I think the bigger issue is that they are currently getting wrong counts

I agree. But we have to take those decisions carefully nevertheless, because it means that very specific patch level versions of Flow behave differently.

If your fix solves both, performance and inconsistency issues, it should be merged IMO but if @Sebobo or @kdambekalns or anyone else still has concerns, those should be taken seriously

tvt commented 2 months ago

Any feedback @kdambekalns and @Sebobo? I don't think I can do much more here.

This should only have an impact on performance on the queries that currently returns wrong counts.

Sebobo commented 2 months ago

@tvt thanks for the change, I will test it in the use case where I hat the biggest issues and give feedback.

One thing: according to our roadmap this change should target 8.3 and not 7.3.

tvt commented 2 months ago

Hmm, what's the easiest way to target 8.3 instead? Should I create a new PR?

bwaidelich commented 2 months ago

Hmm, what's the easiest way to target 8.3 instead? Should I create a new PR?

You could change the target branch in this PR, but that usually leads to a mess. A new PR would be better, you can probably just cherry-pick your commits:

git cherry-pick da13be260fce86d03d6d42c5157e7b6a785297b9
git cherry-pick eea4d2b90d123238790a537cfaa003d8b4763c14
git cherry-pick 521440396b3695396251ed332acd2060102ad71d
tvt commented 2 months ago

I've recreated the PR for 8.3 here: #3345.

kdambekalns commented 2 months ago

Thanks for pushing this through!

Hint: When going "up", changing the base of a PR works fine more often than not.