kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.28k stars 90 forks source link

Bug: Cannot attach a remote kuzu database due to invalid path #4165

Closed DeepRecursion closed 1 week ago

DeepRecursion commented 2 weeks ago

Kùzu version

v0.6.0

What operating system are you using?

MacOSX Sonoma 14.5

What happened?

I'm running the below commands using python sdk

INSTALL httpfs; 
LOAD EXTENSION httpfs;
CALL s3_access_key_id='{AWS_ACCESS_KEY_ID}';
CALL s3_secret_access_key='{AWS_SECRET_ACCESS_KEY}';
CALL s3_region='us-west-1';
ATTACH 's3://{bucket}/graph_db/' AS uw (dbtype kuzu)

and getting the error: Cannot attach a remote kuzu database due to invalid path: s3://{my_bucket}/graph_db/.

The bucket and folder exist and I've tried making the bucket public but get this error every time.

Are there known steps to reproduce?

No response

semihsalihoglu-uw commented 2 weeks ago

Assigned to @acquamarin who should look into this.

acquamarin commented 2 weeks ago

Can you provide me with more information so i can better look into the issue? And may i know how do you make it public? So i just tried creating a dummy database in s3, and attach in kuzu:

➜  kuzu git:(master) ✗ ./build/release/tools/shell/kuzu tinysnbdd
Opened the database at path: tinysnbdd in read-write mode.
Enter ":help" for usage hints.
kuzu> LOAD EXTENSION httpfs;
┌────────────────────────────────────┐
│ result                             │
│ STRING                             │
├────────────────────────────────────┤
│ Extension: httpfs has been loaded. │
└────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.06ms (compiling), 3.47ms (executing)
kuzu> load from 's3://kuzu-dataset-test/tinysnb/vPerson.csv' return *;
kuzu> CALL s3_access_key_id='xxxxxxx'
..> ;
(0 tuples)
(0 columns)
Time: 0.48ms (compiling), 0.12ms (executing)
kuzu> CALL s3_secret_access_key='xxxxxxx';
(0 tuples)
(0 columns)
Time: 0.02ms (compiling), 0.08ms (executing)
kuzu> attach 's3://xxxxx/tinysnb' as tinysnb (dbtype kuzu);
┌─────────────────────────────────┐
│ result                          │
│ STRING                          │
├─────────────────────────────────┤
│ Attached database successfully. │
└─────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.04ms (compiling), 18412.74ms (executing)
Analect commented 2 weeks ago

Running into similar difficulties while loading a cypher file from CLI. Perhaps this might shed light on what is going on.

I have a file remote-db.cypher per below. I'm obviously using a fictitious s3 path, but this is against a private bucket.

INSTALL httpfs;
LOAD EXTENSION httpfs;
CALL s3_access_key_id='xxx';
CALL s3_secret_access_key='xxx';
CALL s3_region='us-east-1';
ATTACH "s3://my-bucket/path/to/database/kuzu-test." AS meta (dbtype kuzu);
CALL SHOW_ATTACHED_DATABASES() RETURN *;
MATCH (a:Document)-[f:authored_by]->(b:Author) RETURN a.Title,f,b.Author;
DETACH meta;

I run kuzu < remote-db.cypher, which should run the above cypher script against an in-memory kuzu db. It fails.

kuzu < remote-db.cypher 
Opened the database under in-memory mode.
Enter ":help" for usage hints.
┌───────────────────────────────────────┐
│ result                                │
│ STRING                                │
├───────────────────────────────────────┤
│ Extension: httpfs has been installed. │
└───────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.18ms (compiling), 1708.04ms (executing)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ result                                                                                                 │
│ STRING                                                                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Extension: /home/user/.kuzu/extension/0.5.1.3/linux_amd64/libhttpfs.kuzu_extension has been loaded. │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 143.41ms (executing)
(0 tuples)
(0 columns)
Time: 12.97ms (compiling), 21.70ms (executing)
(0 tuples)
(0 columns)
Time: 0.02ms (compiling), 0.03ms (executing)
(0 tuples)
(0 columns)
Time: 8.17ms (compiling), 0.05ms (executing)
Error: Runtime exception: Cannot attach a remote kuzu database due to invalid path: s3://my-bucket/path/to/database/kuzu-test.
┌────────┬───────────────┐
│ name   │ database type │
│ STRING │ STRING        │
├────────┼───────────────┤
└────────┴───────────────┘
(0 tuples)
(2 columns)
Time: 0.46ms (compiling), 0.20ms (executing)
Error: Binder exception: Table Document does not exist.
Error: Runtime exception: Database: meta doesn't exist.

If I run the same steps through the CLI directly, then it works. I'm using a general-purpose rather than directory type bucket. I realise the latter are intended for more performant, low-latency use-cases. When doing this via the CLI, I noticed it takes circa 30 seconds for the ATTACH to complete, eventhough this is a tiny database. I'm wondering if that delay is somehow timing-out when run via a cypher file ingestment or using the python SDK, per the original poster.

kuzu test-s3
Opened the database at path: test-s3 in read-write mode.
Enter ":help" for usage hints.
kuzu> INSTALL httpfs;
┌───────────────────────────────────────┐
│ result                                │
│ STRING                                │
├───────────────────────────────────────┤
│ Extension: httpfs has been installed. │
└───────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.07ms (compiling), 1130.87ms (executing)
kuzu> LOAD EXTENSION httpfs;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ result                                                                                                 │
│ STRING                                                                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Extension: /home/user/.kuzu/extension/0.5.1.3/linux_amd64/libhttpfs.kuzu_extension has been loaded. │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 3.72ms (executing)
kuzu> CALL s3_access_key_id='xxx';
(0 tuples)
(0 columns)
Time: 0.06ms (compiling), 0.10ms (executing)
kuzu> CALL s3_secret_access_key='xxx';
(0 tuples)
(0 columns)
Time: 0.02ms (compiling), 0.35ms (executing)
kuzu> CALL s3_region='us-east-1';
(0 tuples)
(0 columns)
Time: 0.03ms (compiling), 0.56ms (executing)
kuzu> ATTACH "s3://my-bucket/path/to/database/kuzu-test" AS meta (dbtype kuzu);
┌─────────────────────────────────┐
│ result                          │
│ STRING                          │
├─────────────────────────────────┤
│ Attached database successfully. │
└─────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.04ms (compiling), 29931.47ms (executing)
kuzu> CALL SHOW_ATTACHED_DATABASES() RETURN *;
┌────────┬───────────────┐
│ name   │ database type │
│ STRING │ STRING        │
├────────┼───────────────┤
│ meta   │ KUZU          │
└────────┴───────────────┘
(1 tuple)
(2 columns)
Time: 0.19ms (compiling), 0.34ms (executing)
kuzu> MATCH (a:Document)-[f:authored_by]->(b:Author) RETURN a.Title,f,b.Author;
┌─────────────────────────────────────────┬─────────────────────────────────────────┬─────────────────────────────────────────┐
│ a.Title                                 │ f                                       │ b.Author                                │
│ STRING                                  │ REL                                     │ STRING                                  │
├─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┤
│ Title                                   │ (0:0)-{_LABEL: authored_by, _ID: 5:0... │ Author                                  │
...
(11 tuples)
(3 columns)
Time: 107.06ms (compiling), 3929.30ms (executing)
kuzu> DETACH meta;
┌─────────────────────────────────┐
│ result                          │
│ STRING                          │
├─────────────────────────────────┤
│ Detached database successfully. │
└─────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 3.12ms (executing)

image

acquamarin commented 2 weeks ago

Running into similar difficulties while loading a cypher file from CLI. Perhaps this might shed light on what is going on.

I have a file remote-db.cypher per below. I'm obviously using a fictitious s3 path, but this is against a private bucket.

INSTALL httpfs;
LOAD EXTENSION httpfs;
CALL s3_access_key_id='xxx';
CALL s3_secret_access_key='xxx';
CALL s3_region='us-east-1';
ATTACH "s3://my-bucket/path/to/database/kuzu-test." AS meta (dbtype kuzu);
CALL SHOW_ATTACHED_DATABASES() RETURN *;
MATCH (a:Document)-[f:authored_by]->(b:Author) RETURN a.Title,f,b.Author;
DETACH meta;

I run kuzu < remote-db.cypher, which should run the above cypher script against an in-memory kuzu db. It fails.

kuzu < remote-db.cypher 
Opened the database under in-memory mode.
Enter ":help" for usage hints.
┌───────────────────────────────────────┐
│ result                                │
│ STRING                                │
├───────────────────────────────────────┤
│ Extension: httpfs has been installed. │
└───────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.18ms (compiling), 1708.04ms (executing)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ result                                                                                                 │
│ STRING                                                                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Extension: /home/user/.kuzu/extension/0.5.1.3/linux_amd64/libhttpfs.kuzu_extension has been loaded. │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 143.41ms (executing)
(0 tuples)
(0 columns)
Time: 12.97ms (compiling), 21.70ms (executing)
(0 tuples)
(0 columns)
Time: 0.02ms (compiling), 0.03ms (executing)
(0 tuples)
(0 columns)
Time: 8.17ms (compiling), 0.05ms (executing)
Error: Runtime exception: Cannot attach a remote kuzu database due to invalid path: s3://my-bucket/path/to/database/kuzu-test.
┌────────┬───────────────┐
│ name   │ database type │
│ STRING │ STRING        │
├────────┼───────────────┤
└────────┴───────────────┘
(0 tuples)
(2 columns)
Time: 0.46ms (compiling), 0.20ms (executing)
Error: Binder exception: Table Document does not exist.
Error: Runtime exception: Database: meta doesn't exist.

If I run the same steps through the CLI directly, then it works. I'm using a general-purpose rather than directory type bucket. I realise the latter are intended for more performant, low-latency use-cases. When doing this via the CLI, I noticed it takes circa 30 seconds for the ATTACH to complete, eventhough this is a tiny database. I'm wondering if that delay is somehow timing-out when run via a cypher file ingestment or using the python SDK, per the original poster.

kuzu test-s3
Opened the database at path: test-s3 in read-write mode.
Enter ":help" for usage hints.
kuzu> INSTALL httpfs;
┌───────────────────────────────────────┐
│ result                                │
│ STRING                                │
├───────────────────────────────────────┤
│ Extension: httpfs has been installed. │
└───────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.07ms (compiling), 1130.87ms (executing)
kuzu> LOAD EXTENSION httpfs;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ result                                                                                                 │
│ STRING                                                                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Extension: /home/user/.kuzu/extension/0.5.1.3/linux_amd64/libhttpfs.kuzu_extension has been loaded. │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 3.72ms (executing)
kuzu> CALL s3_access_key_id='xxx';
(0 tuples)
(0 columns)
Time: 0.06ms (compiling), 0.10ms (executing)
kuzu> CALL s3_secret_access_key='xxx';
(0 tuples)
(0 columns)
Time: 0.02ms (compiling), 0.35ms (executing)
kuzu> CALL s3_region='us-east-1';
(0 tuples)
(0 columns)
Time: 0.03ms (compiling), 0.56ms (executing)
kuzu> ATTACH "s3://my-bucket/path/to/database/kuzu-test" AS meta (dbtype kuzu);
┌─────────────────────────────────┐
│ result                          │
│ STRING                          │
├─────────────────────────────────┤
│ Attached database successfully. │
└─────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.04ms (compiling), 29931.47ms (executing)
kuzu> CALL SHOW_ATTACHED_DATABASES() RETURN *;
┌────────┬───────────────┐
│ name   │ database type │
│ STRING │ STRING        │
├────────┼───────────────┤
│ meta   │ KUZU          │
└────────┴───────────────┘
(1 tuple)
(2 columns)
Time: 0.19ms (compiling), 0.34ms (executing)
kuzu> MATCH (a:Document)-[f:authored_by]->(b:Author) RETURN a.Title,f,b.Author;
┌─────────────────────────────────────────┬─────────────────────────────────────────┬─────────────────────────────────────────┐
│ a.Title                                 │ f                                       │ b.Author                                │
│ STRING                                  │ REL                                     │ STRING                                  │
├─────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┤
│ Title                                   │ (0:0)-{_LABEL: authored_by, _ID: 5:0... │ Author                                  │
...
(11 tuples)
(3 columns)
Time: 107.06ms (compiling), 3929.30ms (executing)
kuzu> DETACH meta;
┌─────────────────────────────────┐
│ result                          │
│ STRING                          │
├─────────────────────────────────┤
│ Detached database successfully. │
└─────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.02ms (compiling), 3.12ms (executing)

image

Hi @Analect For the first test:

kuzu < remote-db.cypher 
Opened the database under in-memory mode.

Looks like you are trying to attach a remote kuzu database in in-memory database, which is not supported yet.

For the second test:

Opened the database at path: test-s3 in read-write mode.

You are attaching the remote kuzu database in on-disk mode kuzu, which is supported right now.

I would recommend using the on-disk version of kuzu when attaching a remote kuzu database.

For the performance issue you reported, can you try turn the file cache on by CALL HTTP_CACHE_FILE=TRUE ? It should improve the attach performance a lot.

Let me know if it works.

Thanks, Ziyi

Analect commented 2 weeks ago

@acquamarin ... my mistake ref. trying to get this working with an in-memory database. I can confirm it works fine with an on-disk version and yes, that cache setting appears to speed things up.

DeepRecursion commented 2 weeks ago

@acquamarin I made the folder public just for troubleshooting, it shouldn't need to be public since i'm providing my aws keys right?
image

I tested using cli by following your example above and get the same error: image

What other information can I provide?

acquamarin commented 2 weeks ago

I tested using cli by following your example above and get the same error:

Hi @DeepRecursion Yes, you don't have to make it public since you are providing the keys. Looks wired, Do you mind joining our discord channel, so i can help you better?

acquamarin commented 1 week ago

Hi @Analect We have supported attaching a remote kuzu database in in-memory mode in this PR : https://github.com/kuzudb/kuzu/pull/4177. Let me know if this fixes your issue.

Thanks

DeepRecursion commented 1 week ago

I discovered this was an ssl certificate issue on my machine