vaticle / typedb

TypeDB: the polymorphic database powered by types
https://typedb.com
Mozilla Public License 2.0
3.73k stars 338 forks source link

Match query volatility #6155

Closed BFergerson closed 2 years ago

BFergerson commented 3 years ago

Description

I'm experiencing a high amount of volatility when executing a match query on a database empty of inserted data. I believe this means that my issue exists within the scope of the query planner/type resolver. Regardless, I have a query that when executed will finish sometime between 6 seconds and 30 minutes.

I've duplicated this issue to a degree in the following project: https://github.com/BFergerson/grakn2-volatility

If you check the build log for the above project, you will see the following: Screenshot from 2021-02-09 21-22-32

The build 15 took 3 minutes to execute while build 6 took a full hour to execute. There is no difference between the code used to execute these two jobs apart from an updated comment used to trigger another build. This is also not the result of computer/network latency in the build process as evident from all the runs between 15 and 6, which also executed the same exact code with the only change being a modified comment.

The query I'm running is the following:

match
$function isa SourceArtifact;
(is_parent: $function, is_child: $functionName);
($functionName) isa IDENTIFIER;
$functionName has token "main";
get $function; count;

I've re-run this query on Grakn 1.7 and found it to take consistently ~5 seconds.

What is the query planner doing that would take 30 minutes to answer this question? Naively, it seems to me that the first thing it should do, given the query contains no disjunctions, is try to resolve $function. In trying to resolve $function it should be looking for something which is a SourceArtifact and finding that 0 such entities exist. It would seem that at that point it could simply let the fact that it could find no $function that it couldn't possibly find $functionName as there is no $function which it could relate to via (is_parent, is_child).


What would really help my understanding of Grakn is how the query planner chooses plans to execute. I believe there was a presentation the Grakn engineer's created (@flyingsilverfin I believe). I think it would be beneficial to see even if not up-to-date as it is difficult for me to debug this issue with my current understanding.

Environment

  1. OS (where Grakn server runs): Ubuntu 20, GitHub Actions
  2. Grakn version (and platform): Grakn 2.0.0-alpha-6
  3. Grakn client: client-java 2.0.0-alpha-8

Reproducible Steps

Fork https://github.com/BFergerson/grakn2-volatility and trigger a build

Expected Output

I'd expect the build time to remain consistent and I'd hope it would be under a few minutes each time.

Actual Output

Execution time ranges widely

Additional information

I'll add information as I can. I'm still learning how to effectively debug Grakn performance problems. A better understanding of the internal query planner may help push a resolution faster.

Possibly related to: #6151 cc: @lriuui0x0

haikalpribadi commented 3 years ago

This issue will be solved by https://github.com/graknlabs/grakn/issues/6194. I will work on it next week, and have it out in the subsequent release.

BFergerson commented 2 years ago

@haikalpribadi, thanks for seeing this issue through. I've got a bit of catchup to do, but I'm excited to see what's now possible with the latest release. I remember this being one of the final issues I needed to solve to release new software.

Proof of performance stability

Screenshot from 2021-09-30 19-25-50

The first two runs labeled 15 are Grakn 2.0.0-alpha-8. The next three runs labeled upgrade to 2.4.0, 16, and 17 are TypeDB 2.4.0. And the final run downgrade to 2.3.0 for good measure is TypeDB 2.3.0 (which timed out).