langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.12k stars 14.98k forks source link

Athena Loader errors when result s3 bucket uri has no prefix #17525

Closed shawnzam closed 4 months ago

shawnzam commented 7 months ago

Checked other resources

Example Code

from langchain_community.document_loaders.athena import AthenaLoader

database_name = "database"
s3_output_path = "s3://bucket-no-prefix"
query="""SELECT 
  CAST(extract(hour FROM current_timestamp) AS INTEGER) AS current_hour,
  CAST(extract(minute FROM current_timestamp) AS INTEGER) AS current_minute,
  CAST(extract(second FROM current_timestamp) AS INTEGER) AS current_second;
"""
profile_name = "AdministratorAccess"

loader = AthenaLoader(
    query=query,
    database=database_name,
    s3_output_uri=s3_output_path,
    profile_name=profile_name,
)

documents = loader.load()
print(documents)

Error Message and Stack Trace (if applicable)

NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist

Description

Athena Loader errors when result s3 bucket uri has no prefix. The Loader instance call results in a "NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist." error.

If s3_output_path contains a prefix like:

s3_output_path = "s3://bucket-with-prefix/prefix"

Execution works without an error.

Suggested solution

Modify:

key = "/".join(tokens[1:]) + "/" + query_execution_id + ".csv"

to

key = "/".join(tokens[1:]) + ("/" if tokens[1:] else "") + query_execution_id + ".csv"

https://github.com/langchain-ai/langchain/blob/9e8a3fc4fff8e20ab5d1f113515ded14906eb6f3/libs/community/langchain_community/document_loaders/athena.py#L128

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 22.6.0: Fri Sep 15 13:41:30 PDT 2023; root:xnu-8796.141.3.700.8~1/RELEASE_ARM64_T8103 Python Version: 3.9.9 (main, Jan 9 2023, 11:42:03) [Clang 14.0.0 (clang-1400.0.29.102)]

Package Information

langchain_core: 0.1.23 langchain: 0.1.7 langchain_community: 0.0.20 langsmith: 0.0.87 langchain_openai: 0.0.6 langchainhub: 0.1.14

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

dosubot[bot] commented 7 months ago

🤖

I appreciate your thorough investigation into the issue with the Athena Loader in the LangChain Python framework. It's great to see that you've identified a potential solution and even provided the modified code. Your proposed modification seems reasonable, and I encourage you to go ahead and create a pull request with your suggested fix. Your contribution will be valuable in improving the LangChain framework for the community. Thank you for your dedication to resolving this issue!

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
shawnzam commented 7 months ago

@dosu-bot here is the PR: https://github.com/langchain-ai/langchain/pull/17526

dosubot[bot] commented 4 months ago

Hi, @shawnzam,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. It looks like you've provided a pull request with a suggested fix for the "Athena Loader encountering error with missing S3 bucket URI prefix" issue.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution!