ConfluenceLoader.load giving AttributeError: 'str' object has no attribute 'get' while reading all documents from space

sameermahajan commented 8 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.

Example Code

from langchain.document_loaders import ConfluenceLoader

loader = ConfluenceLoader( url="", username="", api_key="" ) documents = loader.load(space_key="", include_attachments=True, limit=1, max_pages=1)

Description

I am trying to load all confluence pages using ConflueceLoader. I expect to get all the pages but instead I get the AttributeError: 'str' object has no attribute 'get'

System Info

python version 3.10.2 langchain version 0.0.345

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

dosubot[bot] commented 8 months ago

🤖

Hello @sameermahajan! I'm Dosu, an AI bot here to assist you while we wait for a human maintainer to join us. I can help you crack some bugs, answer questions, or guide you on your journey as a contributor. Let's get to solving this!

Thank you for providing the detailed information. The error you're encountering, AttributeError: 'str' object has no attribute 'get', typically occurs when you're trying to use a method that doesn't exist for the data type you're working with. In this case, it seems like you're trying to use the get method on a string object, which is not valid because get is a method for dictionaries, not strings.

Given the context, it's likely that the error is occurring in the _search_content_by_cql method in the ConfluenceLoader class. This method attempts to use the get method on the response object:

response = self.confluence.get(url, params=params)
return response.get("results", [])

The get method is used to retrieve the value of the "results" key from the response dictionary. If response is a string instead of a dictionary, this would result in the AttributeError you're seeing.

This could happen if the get method of the self.confluence object (an instance of the Confluence class from the atlassian package) is returning a string instead of a dictionary. This might occur if there's an error in the request and the Confluence class is returning the error message as a string.

To fix this issue, you need to ensure that the get method of the Confluence class is returning a dictionary. If it's returning an error message as a string, you should modify the Confluence class to return error messages in a different way, such as by raising an exception.

You can find more details about the ConfluenceLoader class in the LangChain repository.

Please let me know if this helps or if you need further assistance.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

GauravWaghmare commented 8 months ago

@sameermahajan can you please give the entire stack trace here?

sameermahajan commented 8 months ago

@GauravWaghmare here you go:

AttributeError Traceback (most recent call last) Cell In[13], line 1 ----> 1 documents = loader.load(space_key="Platform Software", include_attachments=True, limit=1, max_pages=1)

File C:\GenerativeAI\venv\lib\site-packages\langchain\document_loaders\confluence.py:276, in ConfluenceLoader.load(self, space_key, page_ids, label, cql, include_restricted_content, include_archived_content, include_attachments, include_comments, content_format, limit, max_pages, ocr_languages, keep_markdown_format, keep_newlines) 273 docs = [] 275 if space_key: --> 276 pages = self.paginate_request( 277 self.confluence.get_all_pages_from_space, 278 space=space_key, 279 limit=limit, 280 max_pages=max_pages, 281 status="any" if include_archived_content else "current", 282 expand=content_format.value, 283 ) 284 docs += self.process_pages( 285 pages, 286 include_restricted_content, (...) 292 keep_newlines=keep_newlines, 293 ) 295 if label:

File C:\GenerativeAI\venv\lib\site-packages\langchain\document_loaders\confluence.py:407, in ConfluenceLoader.paginate_request(self, retrieval_method, kwargs) 394 while len(docs) < max_pages: 395 get_pages = retry( 396 reraise=True, 397 stop=stop_after_attempt( (...) 405 before_sleep=before_sleep_log(logger, logging.WARNING), 406 )(retrieval_method) --> 407 batch = get_pages(kwargs, start=len(docs)) 408 if not batch: 409 break

File C:\GenerativeAI\venv\lib\site-packages\tenacity__init__.py:289, in BaseRetrying.wraps..wrapped_f(*args, kw) 287 @functools.wraps(f) 288 def wrapped_f(*args: t.Any, *kw: t.Any) -> t.Any: --> 289 return self(f, args, kw)

File C:\GenerativeAI\venv\lib\site-packages\tenacity__init.py:379, in Retrying.call__(self, fn, *args, **kwargs) 377 retry_state = RetryCallState(retry_object=self, fn=fn, args=args, kwargs=kwargs) 378 while True: --> 379 do = self.iter(retry_state=retry_state) 380 if isinstance(do, DoAttempt): 381 try:

File C:\GenerativeAI\venv\lib\site-packages\tenacity__init__.py:325, in BaseRetrying.iter(self, retry_state) 323 retry_exc = self.retry_error_cls(fut) 324 if self.reraise: --> 325 raise retry_exc.reraise() 326 raise retry_exc from fut.exception() 328 if self.wait:

File C:\GenerativeAI\venv\lib\site-packages\tenacity__init__.py:158, in RetryError.reraise(self) 156 def reraise(self) -> t.NoReturn: 157 if self.last_attempt.failed: --> 158 raise self.last_attempt.result() 159 raise self

File ~\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py:439, in Future.result(self, timeout) 437 raise CancelledError() 438 elif self._state == FINISHED: --> 439 return self.__get_result() 441 self._condition.wait(timeout) 443 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py:391, in Future.__get_result(self) 389 if self._exception: 390 try: --> 391 raise self._exception 392 finally: 393 # Break a reference cycle with the exception in self._exception 394 self = None

File C:\GenerativeAI\venv\lib\site-packages\tenacity__init.py:382, in Retrying.call__(self, fn, *args, *kwargs) 380 if isinstance(do, DoAttempt): 381 try: --> 382 result = fn(args, **kwargs) 383 except BaseException: # noqa: B902 384 retry_state.set_exception(sys.exc_info()) # type: ignore[arg-type]

File C:\GenerativeAI\venv\lib\site-packages\atlassian\confluence.py:572, in Confluence.get_all_pages_from_space(self, space, start, limit, status, expand, content_type) 545 def get_all_pages_from_space( 546 self, 547 space, (...) 552 content_type="page", 553 ): 554 """ 555 Get all pages from space 556 (...) 568 :return: 569 """ 570 return self.get_all_pages_from_space_raw( 571 space=space, start=start, limit=limit, status=status, expand=expand, content_type=content_type --> 572 ).get("results")

AttributeError: 'str' object has no attribute 'get'

deepak-habilelabs commented 8 months ago

@sameermahajan give it a try, its definitely gonna work

documents = [] loader = ConfluenceLoader(url=confluence_url,username=username,api_key=api_key ) for space_key in space_key: documents.extend(loader.load(space_key=space_key,include_attachments=True,limit=100))

sameermahajan commented 8 months ago

@deepak-habilelabs I tried it and it gives the same error as I had expected since it also has loader.load which seems to be the culprit anyway.

deepak-habilelabs commented 8 months ago

@deepak-habilelabs I tried it and it gives the same error as I had expected since it also has loader.load which seems to be the culprit anyway.

can u pls send me the whole python script, so that I can check on my system

sameermahajan commented 8 months ago

@deepak-habilelabs attached. Please remove .txt suffix and replace with your values for url,user,token,space before executing. confluence.py.txt

simpleappdesigner commented 8 months ago

pls check my response on StackOverflow - https://stackoverflow.com/questions/77797689/langchain-document-loaders-confluenceloader-load-giving-attributeerror-str-ob/77809754#77809754

i don't think this is an issue.

sameermahajan commented 8 months ago

Here is the summary of the fixes required in the original code:

Do not suffix the URL with /wiki/home
suffix the user name with @ your domain name
use ID of the space as in the URL and not its display name

then it works. The error handling is poor to point to these issues otherwise. Feel free to use this one as a tracking bug for taking care of the error handling in these situations.

amit2k5 commented 6 months ago

Here is the summary of the fixes required in the original code:

Do not suffix the URL with /wiki/home

suffix the user name with @ your domain name

use ID of the space as in the URL and not its display name

then it works. The error handling is poor to point to these issues otherwise. Feel free to use this one as a tracking bug for taking care of the error handling in these situations.

Can you please paste the sample? Thanks

simpleappdesigner commented 6 months ago

Here is the summary of the fixes required in the original code:

Do not suffix the URL with /wiki/home

suffix the user name with @ your domain name

use ID of the space as in the URL and not its display name

then it works. The error handling is poor to point to these issues otherwise. Feel free to use this one as a tracking bug for taking care of the error handling in these situations.

Can you please paste the sample? Thanks

pls refer this post -> https://stackoverflow.com/questions/77797689/langchain-document-loaders-confluenceloader-load-giving-attributeerror-str-ob/77809754#77809754

dishuwang commented 1 month ago

pls check my response on StackOverflow - https://stackoverflow.com/questions/77797689/langchain-document-loaders-confluenceloader-load-giving-attributeerror-str-ob/77809754#77809754

i don't think this is an issue.

this works for me, thanks just use https://confluence.xxx/ replace https://confluence.xxx/pages/viewpage.action

langchain-ai / langchain