Inconsistent behaviour between CLI and Explorer when specifying equality predicate on `WHERE` clause

prrao87 commented 5 months ago

This is likely important functionality that's missing (or a bug) in the explorer.

Kùzu and explorer version: 0.3.2.
Tested on the demo dataset

The following query works in the CLI and in the Python client (i.e., the core is behaving as expected)

MATCH (a:User)-[e:Follows]->(:User {name: "Karissa"})
RETURN a.name AS follower, e.since AS follows_since;

Returns a valid result in Python:

┌──────────┬───────────────┐
│ follower ┆ follows_since │
│ ---      ┆ ---           │
│ str      ┆ i64           │
╞══════════╪═══════════════╡
│ Adam     ┆ 2020          │
└──────────┴───────────────┘

And also in the CLI:

kuzu>  MATCH (a:User)-[e:Follows ]->(:User {name: "Karissa"})
..>        RETURN a.name AS follower, e.since AS follows_since;
----------------------------
| follower | follows_since |
----------------------------
| Adam     | 2020          |
----------------------------
(1 tuple)
(2 columns)
Time: 0.67ms (compiling), 2.68ms (executing)

However, when the same query is run in Kùzu explorer, it doesn't return anything.

I even tested the same query with the WHERE clause, and the results are not consistent between the CLI/client API and Explorer.

MATCH (a:User)-[e:Follows]->(b:User)
WHERE b.name = "Karissa"
RETURN a.name AS follower, e.since AS follows_since;

Considering that the explorer uses the Node.js client under the hood, do you think something is off with the query parsing when there are predicates on properties there?

I've only tested via the CLI and the Python client, and they both work as expected, so it's something happening that's specific to the Explorer.

mewim commented 5 months ago

It seems I cannot reproduce this bug:

Steps I tried:

Launch Explorer without a database: docker run -p 8000:8000 --pull=always --rm kuzudb/explorer:latest
Load Demo data with Explorer:

Query 1:

Query 2:

mewim commented 5 months ago

If you have loaded the dataset on macOS with CLI/Python API and then launch Explorer against the loaded database, this could be due to https://github.com/kuzudb/kuzu/issues/2943 which has been fixed in https://github.com/kuzudb/kuzu/pull/2952 but does not go into 0.3.x releases due to the storage format change.

prrao87 commented 5 months ago

Ah, that issue might be the reason. Let's keep this open until we release the next version with the storage format change.

mewim commented 3 months ago

@benjaminwinger I revisited this issue for the latest release of kuzu (v0.4.1) and it seems the issue still exists. I loaded demo-db on macOS (as attached) and the query returns empty result in a Linux Docker container. The issue seems to have not much to do with Explorer, as it is reproducible with CLI as well. Could you please look into it? test_db.zip

prrao87 commented 3 months ago

Should we maybe create this as an issue on the Kùzu repo instead?

mewim commented 3 months ago

Update: this issue seems not reproducible on Linux with x86-64 architecture. When running this query on one of our Linux server, it works fine both locally and in Docker, but when using aarch64 Docker images on Mac with Apple M-series CPU, the issue exists.

mewim commented 3 months ago

I was also able to reproduce this on an Oracle cloud server with Ampere A1 processor, so this seems to be specific to aarch64 architecture.

benjaminwinger commented 3 months ago

I don't think this is limited to ARM architectures. I created a branch off v0.4.1 which validates the hash index entries: https://github.com/kuzudb/kuzu/commit/8cfcbe5547b0d17e69a0e252ed2de87e9d01f8b3 (and also fixes some uninitialized data which made direct comparisons of the hash index files more difficult) and ended up with the following issues when opening the above database on x86_64 linux:

Fingerprint for key Zhang in slot 28 was 3 but 166 was expected! Primary slot ID for key Zhang was calculated to be 16 but the key was stored in slot 28 Fingerprint for key Noura in slot 30 was 112 but 114 was expected! Primary slot ID for key Noura was calculated to be 9 but the key was stored in slot 30 Fingerprint for key Karissa in slot 14 was 25 but 187 was expected! Primary slot ID for key Karissa was calculated to be 27 but the key was stored in slot 14 Fingerprint for key Guelph in slot 20 was 60 but 3 was expected! Primary slot ID for key Guelph was calculated to be 6 but the key was stored in slot 20

~~Notice that Adam is not mentioned, so it's pure coincidence that the query is succeeding for that key on linux. It does look like the hashes still aren't functioning the same on the different platforms.~~ Edit: it's Karissa that the query should be looking up using the hash table, and now I'm reproducing it on Linux x86_64 (I'm fairly sure the query succeeded the first time I tried, but regardless, there's clearly something wrong with the hashing on macos)

Update: I can reproduce this with the pre-built 0.4.1 binaries, but not when compiling from scratch. This makes debugging somewhat difficult

kuzudb / explorer

Inconsistent behaviour between CLI and Explorer when specifying equality predicate on `WHERE` clause #129