neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs
https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
Apache License 2.0
2.28k stars 364 forks source link

Can't access my S3 #294

Open xpilasneo4j opened 5 months ago

xpilasneo4j commented 5 months ago

I created a S3 bucket on the Field-Engineering-Pro-Services AWS account and using the AWS access and secret keys, I can't connect the website to my bucket s3://nasa-lessons-learned-files

kartikpersistent commented 5 months ago

Hi @xpilasneo4j can you elaborate what error got white trying S3 bucket?

xpilasneo4j commented 5 months ago

I have this message even if I copy the keys from the Neo4j AWS Field Eng account image

kartikpersistent commented 5 months ago

we will check the logs and fix the issue ASAP.

kartikpersistent commented 5 months ago

Hi @xpilasneo4j we debugged found that bucket URL is going without / ,and if user enters the spaces then it is failing we fixed these issues in the dev branch please try again and let use know.

xpilasneo4j commented 5 months ago

Will try and let you know. Thanks

xpilasneo4j commented 5 months ago

Did you update this URL: https://dev-frontend-dcavk67s4a-uc.a.run.app/ Because I tried and still the same error

kartikpersistent commented 5 months ago

It is working for us we will debug it with some other s3 bucket credentials

image

fridaystreet commented 3 weeks ago

still seem to be getting this error. Running latest dev branch. try with and without trailing /

any ideas what I'm missing?

Cheers

kartikpersistent commented 3 weeks ago

If you don't mind sharing your credentials we will try and let you know what exactly happening

fridaystreet commented 3 weeks ago

thansk, unfortunately not something I can share here, but you've just made me realise I can check logs running it locally.

It's throwing this error

An error occurred (SignatureDoesNotMatch) when calling the ListObjectsV2 operation: The request signature we calculated does not match the signature you provided. Check your key and signing method.

Are there specific permissions the user of the key needs? I've just given it readS3 to try and test it

fridaystreet commented 3 weeks ago

Sorry I probably should have added that I'm not running the absolute latest dev branch, as it doesn't build in docker (which I see you have seen that issue so hopefully connected the dots)

I'm running this commit https://github.com/neo4j-labs/llm-graph-builder/commit/501ece4b57ce50a958e44d799303125395d02735 and having the error above. If it's fixed in latest dev all good, I'll just have to wait until that build issue is fixed.

kartikpersistent commented 1 week ago

Hi @fridaystreet can you try on latest DEV and let us know

fridaystreet commented 1 week ago

@kartikpersistent thanks I'll give it a try today and report back

fridaystreet commented 1 week ago

Looks like the actual connection is working now thanks. but I think maybe I'm just expecting to much from it. I'm trying to test if it is able to scan through some buckets we have of uploaded data of different types, being images, pdfs word documents etc, but I'm getting the following error. Exception: No pdf files found.

The files don't have extensions in the name but do have correct content type metadata. I'll do some more playing around. But I'd say in regards to this particular issue re connecting to s3, it's resolved, thanks

backend | 2024-10-22 22:39:55,370 - Use pytorch device_name: cpu backend | 2024-10-22 22:39:55,370 - Load pretrained SentenceTransformer: all-MiniLM-L6-v2 frontend | 192.168.65.1 - - [22/Oct/2024:22:39:56 +0000] "GET /service-worker-dev.js HTTP/1.1" 304 0 "http://localhost:8080/service-worker-dev.js" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" "-" backend | 2024-10-22 22:39:57,937 - Embedding: Using SentenceTransformer , Dimension:384 backend | 2024-10-22 22:39:57,937 - embedding model:client=SentenceTransformer( backend | (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel backend | (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) backend | (2): Normalize() backend | ) model_name='all-MiniLM-L6-v2' cache_folder=None model_kwargs={} encode_kwargs={} multi_process=False show_progress=False and dimesion:384 backend | 2024-10-22 22:39:58,122 - Enable Communities False backend | 2024-10-22 22:39:58,122 - Communities are disabled or GDS is not available in the database. backend | 2024-10-22 22:39:58,122 - Checking access for database: neo4j backend | 2024-10-22 22:39:58,365 - Read access count: 0 backend | 2024-10-22 22:39:58,366 - The account has write access. backend | 2024-10-22 22:39:58,379 - Get existing files list from graph backend | 2024-10-22 22:40:01,691 - closing connection for sources_list api backend | 2024-10-22 22:40:01,693 - Get existing files list from graph backend | 2024-10-22 22:41:16,210 - file_name : 05931530-7579-11eb-9abb-611bca4c3fa7 and file key : Domain::5fb497f5f1fa7800076a548c/05931530-7579-11eb-9abb-611bca4c3fa7 backend | 2024-10-22 22:41:16,210 - file_name : 212f6960-7579-11eb-9abb-611bca4c3fa7 and file key : Domain::5fb497f5f1fa7800076a548c/212f6960-7579-11eb-9abb-611bca4c3fa7 backend | 2024-10-22 22:41:16,210 - file_name : f37eed80-7599-11eb-9a6c-63fe91573d2c and file key : Domain::5fb497f5f1fa7800076a548c/f37eed80-7599-11eb-9a6c-63fe91573d2c backend | 2024-10-22 22:41:16,210 - file_name : 5a96eb60-66a1-11eb-9777-af61b5d771c0 and file key : Domain::5fb4c515f5a1c20007f94f73/5a96eb60-66a1-11eb-9777-af61b5d771c0 backend | 2024-10-22 22:41:16,211 - file_name : 5d82e7c0-66a1-11eb-9777-af61b5d771c0 and file key : Domain::5fb4c515f5a1c20007f94f73/5d82e7c0-66a1-11eb-9777-af61b5d771c0 backend | 2024-10-22 22:41:16,211 - file_name : 75956a40-2774-11ee-a086-d907b56d49e3 and file key : Domain::5fb4c515f5a1c20007f94f73/75956a40-2774-11ee-a086-d907b56d49e3 backend | 2024-10-22 22:41:16,211 - file_name : 7596f0e0-2774-11ee-a8dc-1f649a592958 and file key : Domain::5fb4c515f5a1c20007f94f73/7596f0e0-2774-11ee-a8dc-1f649a592958 backend | 2024-10-22 22:41:16,211 - file_name : undefined and file key : Domain::5fb4c515f5a1c20007f94f73/undefined backend | 2024-10-22 22:41:16,212 - Exception Stack trace: backend | Traceback (most recent call last): backend | File "/code/score.py", line 108, in create_source_knowledge_graph_url backend | lst_file_name,success_count,failed_count = await asyncio.to_thread(create_source_node_graph_url_s3,graph, model, source_url, aws_access_key_id, aws_secret_access_key, source_type backend | File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread backend | return await loop.run_in_executor(None, func_call) backend | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run backend | result = self.fn(*self.args, **self.kwargs) backend | File "/code/src/main.py", line 45, in create_source_node_graph_url_s3 backend | raise Exception('No pdf files found.') backend | Exception: No pdf files found.

fridaystreet commented 1 week ago

Although, the above error still returns the original invalid credentials error to the front end which is a bit confusing, possibly if it could return the correct error that might help save some time if people are having issues.

fridaystreet commented 1 week ago

Just for anyone else getting here and apologies if this was obvious somewhere and I've missed it. It only appears to scan pdf files in the s3 bucket and they must have .pdf extension in the name, it's not picking up from the content-type.

kartikpersistent commented 1 week ago

@aashipandya

kartikpersistent commented 5 days ago

we only extract PDF files from the bucket