Closed BFergerson closed 2 years ago
This issue will be solved by https://github.com/graknlabs/grakn/issues/6194. I will work on it next week, and have it out in the subsequent release.
@haikalpribadi, thanks for seeing this issue through. I've got a bit of catchup to do, but I'm excited to see what's now possible with the latest release. I remember this being one of the final issues I needed to solve to release new software.
The first two runs labeled 15
are Grakn 2.0.0-alpha-8. The next three runs labeled upgrade to 2.4.0
, 16
, and 17
are TypeDB 2.4.0. And the final run downgrade to 2.3.0 for good measure
is TypeDB 2.3.0 (which timed out).
Description
I'm experiencing a high amount of volatility when executing a match query on a database empty of inserted data. I believe this means that my issue exists within the scope of the query planner/type resolver. Regardless, I have a query that when executed will finish sometime between 6 seconds and 30 minutes.
I've duplicated this issue to a degree in the following project: https://github.com/BFergerson/grakn2-volatility
If you check the build log for the above project, you will see the following:
The build 15 took 3 minutes to execute while build 6 took a full hour to execute. There is no difference between the code used to execute these two jobs apart from an updated comment used to trigger another build. This is also not the result of computer/network latency in the build process as evident from all the runs between 15 and 6, which also executed the same exact code with the only change being a modified comment.
The query I'm running is the following:
I've re-run this query on Grakn 1.7 and found it to take consistently ~5 seconds.
What is the query planner doing that would take 30 minutes to answer this question? Naively, it seems to me that the first thing it should do, given the query contains no disjunctions, is try to resolve
$function
. In trying to resolve$function
it should be looking for something which is aSourceArtifact
and finding that 0 such entities exist. It would seem that at that point it could simply let the fact that it could find no$function
that it couldn't possibly find$functionName
as there is no$function
which it could relate to via(is_parent, is_child)
.What would really help my understanding of Grakn is how the query planner chooses plans to execute. I believe there was a presentation the Grakn engineer's created (@flyingsilverfin I believe). I think it would be beneficial to see even if not up-to-date as it is difficult for me to debug this issue with my current understanding.
Environment
Reproducible Steps
Fork https://github.com/BFergerson/grakn2-volatility and trigger a build
Expected Output
I'd expect the build time to remain consistent and I'd hope it would be under a few minutes each time.
Actual Output
Execution time ranges widely
Additional information
I'll add information as I can. I'm still learning how to effectively debug Grakn performance problems. A better understanding of the internal query planner may help push a resolution faster.
Possibly related to: #6151 cc: @lriuui0x0