Closed DeepRecursion closed 1 week ago
Assigned to @acquamarin who should look into this.
Can you provide me with more information so i can better look into the issue? And may i know how do you make it public? So i just tried creating a dummy database in s3, and attach in kuzu:
➜ kuzu git:(master) ✗ ./build/release/tools/shell/kuzu tinysnbdd
Opened the database at path: tinysnbdd in read-write mode.
Enter ":help" for usage hints.
kuzu> LOAD EXTENSION httpfs;
┌────────────────────────────────────┐
│ result │
│ STRING │
├────────────────────────────────────┤
│ Extension: httpfs has been loaded. │
└────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.06ms (compiling), 3.47ms (executing)
kuzu> load from 's3://kuzu-dataset-test/tinysnb/vPerson.csv' return *;
kuzu> CALL s3_access_key_id='xxxxxxx'
..> ;
(0 tuples)
(0 columns)
Time: 0.48ms (compiling), 0.12ms (executing)
kuzu> CALL s3_secret_access_key='xxxxxxx';
(0 tuples)
(0 columns)
Time: 0.02ms (compiling), 0.08ms (executing)
kuzu> attach 's3://xxxxx/tinysnb' as tinysnb (dbtype kuzu);
┌─────────────────────────────────┐
│ result │
│ STRING │
├─────────────────────────────────┤
│ Attached database successfully. │
└─────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.04ms (compiling), 18412.74ms (executing)
Running into similar difficulties while loading a cypher file from CLI. Perhaps this might shed light on what is going on.
I have a file remote-db.cypher
per below. I'm obviously using a fictitious s3 path, but this is against a private bucket.
INSTALL httpfs;
LOAD EXTENSION httpfs;
CALL s3_access_key_id='xxx';
CALL s3_secret_access_key='xxx';
CALL s3_region='us-east-1';
ATTACH "s3://my-bucket/path/to/database/kuzu-test." AS meta (dbtype kuzu);
CALL SHOW_ATTACHED_DATABASES() RETURN *;
MATCH (a:Document)-[f:authored_by]->(b:Author) RETURN a.Title,f,b.Author;
DETACH meta;
I run kuzu < remote-db.cypher
, which should run the above cypher script against an in-memory kuzu db. It fails.
kuzu < remote-db.cypher
Opened the database under in-memory mode.
Enter ":help" for usage hints.
┌───────────────────────────────────────┐
│ result │
│ STRING │
├───────────────────────────────────────┤
│ Extension: httpfs has been installed. │
└───────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.18ms (compiling), 1708.04ms (executing)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ result │
│ STRING │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Extension: /home/user/.kuzu/extension/0.5.1.3/linux_amd64/libhttpfs.kuzu_extension has been loaded. │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 143.41ms (executing)
(0 tuples)
(0 columns)
Time: 12.97ms (compiling), 21.70ms (executing)
(0 tuples)
(0 columns)
Time: 0.02ms (compiling), 0.03ms (executing)
(0 tuples)
(0 columns)
Time: 8.17ms (compiling), 0.05ms (executing)
Error: Runtime exception: Cannot attach a remote kuzu database due to invalid path: s3://my-bucket/path/to/database/kuzu-test.
┌────────┬───────────────┐
│ name │ database type │
│ STRING │ STRING │
├────────┼───────────────┤
└────────┴───────────────┘
(0 tuples)
(2 columns)
Time: 0.46ms (compiling), 0.20ms (executing)
Error: Binder exception: Table Document does not exist.
Error: Runtime exception: Database: meta doesn't exist.
If I run the same steps through the CLI directly, then it works. I'm using a general-purpose
rather than directory
type bucket. I realise the latter are intended for more performant, low-latency use-cases. When doing this via the CLI, I noticed it takes circa 30 seconds for the ATTACH
to complete, eventhough this is a tiny database. I'm wondering if that delay is somehow timing-out when run via a cypher file ingestment or using the python SDK, per the original poster.
kuzu test-s3
Opened the database at path: test-s3 in read-write mode.
Enter ":help" for usage hints.
kuzu> INSTALL httpfs;
┌───────────────────────────────────────┐
│ result │
│ STRING │
├───────────────────────────────────────┤
│ Extension: httpfs has been installed. │
└───────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.07ms (compiling), 1130.87ms (executing)
kuzu> LOAD EXTENSION httpfs;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ result │
│ STRING │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Extension: /home/user/.kuzu/extension/0.5.1.3/linux_amd64/libhttpfs.kuzu_extension has been loaded. │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 3.72ms (executing)
kuzu> CALL s3_access_key_id='xxx';
(0 tuples)
(0 columns)
Time: 0.06ms (compiling), 0.10ms (executing)
kuzu> CALL s3_secret_access_key='xxx';
(0 tuples)
(0 columns)
Time: 0.02ms (compiling), 0.35ms (executing)
kuzu> CALL s3_region='us-east-1';
(0 tuples)
(0 columns)
Time: 0.03ms (compiling), 0.56ms (executing)
kuzu> ATTACH "s3://my-bucket/path/to/database/kuzu-test" AS meta (dbtype kuzu);
┌─────────────────────────────────┐
│ result │
│ STRING │
├─────────────────────────────────┤
│ Attached database successfully. │
└─────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.04ms (compiling), 29931.47ms (executing)
kuzu> CALL SHOW_ATTACHED_DATABASES() RETURN *;
┌────────┬───────────────┐
│ name │ database type │
│ STRING │ STRING │
├────────┼───────────────┤
│ meta │ KUZU │
└────────┴───────────────┘
(1 tuple)
(2 columns)
Time: 0.19ms (compiling), 0.34ms (executing)
kuzu> MATCH (a:Document)-[f:authored_by]->(b:Author) RETURN a.Title,f,b.Author;
┌─────────────────────────────────────────┬─────────────────────────────────────────┬─────────────────────────────────────────┐
│ a.Title │ f │ b.Author │
│ STRING │ REL │ STRING │
├─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┤
│ Title │ (0:0)-{_LABEL: authored_by, _ID: 5:0... │ Author │
...
(11 tuples)
(3 columns)
Time: 107.06ms (compiling), 3929.30ms (executing)
kuzu> DETACH meta;
┌─────────────────────────────────┐
│ result │
│ STRING │
├─────────────────────────────────┤
│ Detached database successfully. │
└─────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 3.12ms (executing)
Running into similar difficulties while loading a cypher file from CLI. Perhaps this might shed light on what is going on.
I have a file
remote-db.cypher
per below. I'm obviously using a fictitious s3 path, but this is against a private bucket.INSTALL httpfs; LOAD EXTENSION httpfs; CALL s3_access_key_id='xxx'; CALL s3_secret_access_key='xxx'; CALL s3_region='us-east-1'; ATTACH "s3://my-bucket/path/to/database/kuzu-test." AS meta (dbtype kuzu); CALL SHOW_ATTACHED_DATABASES() RETURN *; MATCH (a:Document)-[f:authored_by]->(b:Author) RETURN a.Title,f,b.Author; DETACH meta;
I run
kuzu < remote-db.cypher
, which should run the above cypher script against an in-memory kuzu db. It fails.kuzu < remote-db.cypher Opened the database under in-memory mode. Enter ":help" for usage hints. ┌───────────────────────────────────────┐ │ result │ │ STRING │ ├───────────────────────────────────────┤ │ Extension: httpfs has been installed. │ └───────────────────────────────────────┘ (1 tuple) (1 column) Time: 0.18ms (compiling), 1708.04ms (executing) ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ result │ │ STRING │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Extension: /home/user/.kuzu/extension/0.5.1.3/linux_amd64/libhttpfs.kuzu_extension has been loaded. │ └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (1 tuple) (1 column) Time: 0.02ms (compiling), 143.41ms (executing) (0 tuples) (0 columns) Time: 12.97ms (compiling), 21.70ms (executing) (0 tuples) (0 columns) Time: 0.02ms (compiling), 0.03ms (executing) (0 tuples) (0 columns) Time: 8.17ms (compiling), 0.05ms (executing) Error: Runtime exception: Cannot attach a remote kuzu database due to invalid path: s3://my-bucket/path/to/database/kuzu-test. ┌────────┬───────────────┐ │ name │ database type │ │ STRING │ STRING │ ├────────┼───────────────┤ └────────┴───────────────┘ (0 tuples) (2 columns) Time: 0.46ms (compiling), 0.20ms (executing) Error: Binder exception: Table Document does not exist. Error: Runtime exception: Database: meta doesn't exist.
If I run the same steps through the CLI directly, then it works. I'm using a
general-purpose
rather thandirectory
type bucket. I realise the latter are intended for more performant, low-latency use-cases. When doing this via the CLI, I noticed it takes circa 30 seconds for theATTACH
to complete, eventhough this is a tiny database. I'm wondering if that delay is somehow timing-out when run via a cypher file ingestment or using the python SDK, per the original poster.kuzu test-s3 Opened the database at path: test-s3 in read-write mode. Enter ":help" for usage hints. kuzu> INSTALL httpfs; ┌───────────────────────────────────────┐ │ result │ │ STRING │ ├───────────────────────────────────────┤ │ Extension: httpfs has been installed. │ └───────────────────────────────────────┘ (1 tuple) (1 column) Time: 0.07ms (compiling), 1130.87ms (executing) kuzu> LOAD EXTENSION httpfs; ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ result │ │ STRING │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Extension: /home/user/.kuzu/extension/0.5.1.3/linux_amd64/libhttpfs.kuzu_extension has been loaded. │ └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (1 tuple) (1 column) Time: 0.02ms (compiling), 3.72ms (executing) kuzu> CALL s3_access_key_id='xxx'; (0 tuples) (0 columns) Time: 0.06ms (compiling), 0.10ms (executing) kuzu> CALL s3_secret_access_key='xxx'; (0 tuples) (0 columns) Time: 0.02ms (compiling), 0.35ms (executing) kuzu> CALL s3_region='us-east-1'; (0 tuples) (0 columns) Time: 0.03ms (compiling), 0.56ms (executing) kuzu> ATTACH "s3://my-bucket/path/to/database/kuzu-test" AS meta (dbtype kuzu); ┌─────────────────────────────────┐ │ result │ │ STRING │ ├─────────────────────────────────┤ │ Attached database successfully. │ └─────────────────────────────────┘ (1 tuple) (1 column) Time: 0.04ms (compiling), 29931.47ms (executing) kuzu> CALL SHOW_ATTACHED_DATABASES() RETURN *; ┌────────┬───────────────┐ │ name │ database type │ │ STRING │ STRING │ ├────────┼───────────────┤ │ meta │ KUZU │ └────────┴───────────────┘ (1 tuple) (2 columns) Time: 0.19ms (compiling), 0.34ms (executing) kuzu> MATCH (a:Document)-[f:authored_by]->(b:Author) RETURN a.Title,f,b.Author; ┌─────────────────────────────────────────┬─────────────────────────────────────────┬─────────────────────────────────────────┐ │ a.Title │ f │ b.Author │ │ STRING │ REL │ STRING │ ├─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┤ │ Title │ (0:0)-{_LABEL: authored_by, _ID: 5:0... │ Author │ ... (11 tuples) (3 columns) Time: 107.06ms (compiling), 3929.30ms (executing) kuzu> DETACH meta; ┌─────────────────────────────────┐ │ result │ │ STRING │ ├─────────────────────────────────┤ │ Detached database successfully. │ └─────────────────────────────────┘ (1 tuple) (1 column) Time: 0.02ms (compiling), 3.12ms (executing)
Hi @Analect For the first test:
kuzu < remote-db.cypher
Opened the database under in-memory mode.
Looks like you are trying to attach a remote kuzu database in in-memory database, which is not supported yet.
For the second test:
Opened the database at path: test-s3 in read-write mode.
You are attaching the remote kuzu database in on-disk mode kuzu, which is supported right now.
I would recommend using the on-disk version of kuzu when attaching a remote kuzu database.
For the performance issue you reported, can you try turn the file cache on by CALL HTTP_CACHE_FILE=TRUE
?
It should improve the attach performance a lot.
Let me know if it works.
Thanks, Ziyi
@acquamarin ... my mistake ref. trying to get this working with an in-memory database. I can confirm it works fine with an on-disk version and yes, that cache setting appears to speed things up.
@acquamarin I made the folder public just for troubleshooting, it shouldn't need to be public since i'm providing my aws keys right?
I tested using cli by following your example above and get the same error:
What other information can I provide?
I tested using cli by following your example above and get the same error:
Hi @DeepRecursion Yes, you don't have to make it public since you are providing the keys. Looks wired, Do you mind joining our discord channel, so i can help you better?
Hi @Analect We have supported attaching a remote kuzu database in in-memory mode in this PR : https://github.com/kuzudb/kuzu/pull/4177. Let me know if this fixes your issue.
Thanks
I discovered this was an ssl certificate issue on my machine
Kùzu version
v0.6.0
What operating system are you using?
MacOSX Sonoma 14.5
What happened?
I'm running the below commands using python sdk
and getting the error:
Cannot attach a remote kuzu database due to invalid path: s3://{my_bucket}/graph_db/
.The bucket and folder exist and I've tried making the bucket public but get this error every time.
Are there known steps to reproduce?
No response