run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.44k stars 729 forks source link

[Bug]: Confluence Loader is not working, I am getting requests.exceptions.HTTPError: 403 Client Error: Forbidden #795

Closed sayanb closed 7 months ago

sayanb commented 9 months ago

Bug Description

These lines of code

reader = ConfluenceReader(base_url=BASE_URL)
documents = reader.load_data(space_key=SPACE,
                                     include_attachments=True,
                                     page_status="current")

Are throwing this exception

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: REDACTED/wiki/rest/api/content?spaceKey=REDACTED&status=current&expand=body.storage.value&type=page

Version

0.9.13

Steps to Reproduce

  1. Create an API token on https://id.atlassian.com/manage-profile/security/api-tokens
  2. Add the API token in .env as CONFLUENCE_API_TOKEN=<my-api-token>
  3. Add the base URL in .env as BASE_URL=<my confluence base URL ending in /wiki>
  4. Navigate to a space on the same Confluence account and copy this space key: URL/wiki/spaces/SPACE KEY/overview
  5. In .env, set the space key as SPACE=<space key from above step>
  6. Execute the code in the bug description

Relevant Logs/Tracbacks

ERROR:atlassian.confluence:'message'
'message'
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/atlassian/confluence.py", line 3149, in raise_for_status
    error_msg = j["message"]
KeyError: 'message'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ingestion_confluence.py", line 36, in <module>
    documents = reader.load_data(space_key=SPACE,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_hub/confluence/base.py", line 163, in load_data
    self._get_data_with_paging(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_hub/confluence/base.py", line 254, in _get_data_with_paging
    results = self._get_data_with_retry(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/retrying.py", line 266, in call
    raise attempt.get()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/home/ubuntu/.local/lib/python3.8/site-packages/six.py", line 719, in reraise
    raise value
  File "/home/ubuntu/.local/lib/python3.8/site-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_hub/confluence/base.py", line 316, in _get_data_with_retry
    return function(**kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/atlassian/confluence.py", line 570, in get_all_pages_from_space
    return self.get_all_pages_from_space_raw(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/atlassian/confluence.py", line 533, in get_all_pages_from_space_raw
    response = self.get(url, params=params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/atlassian/rest_client.py", line 288, in get
    response = self.request(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/atlassian/rest_client.py", line 260, in request
    self.raise_for_status(response)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/atlassian/confluence.py", line 3152, in raise_for_status
    response.raise_for_status()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://REDACTED/wiki/rest/api/content?spaceKey=REDACTED&status=current&expand=body.storage.value&type=page
anoopshrma commented 8 months ago

Hi, Can you check if your token has the required permissions to fetch the records.

sayanb commented 8 months ago

Hi, Can you check if your token has the required permissions to fetch the records.

Hi @anoopshrma I am the admin of that Confluence account, and when I created the token by following this step:

Create an API token on https://id.atlassian.com/manage-profile/security/api-tokens

I was logged into Confluence with that admin account.

I also manually verified that I am able to access the space (see step 4 of my steps to reproduce).

anoopshrma commented 8 months ago

One way to check if the token is working or not is by checking with confluence library directly and verify if your token is able to fetch the records or not.

soras39 commented 8 months ago

Hi, I am facing same issue. Like sayanb, I also verified access by using curl with access_token and work fine.

Set "loglevel DEBUG", and check the output. Output is like as follows:

DEBUG:atlassian.rest_client:curl --silent -X GET -H 'Content-Type: application/json' -H 'Accept: application/json' 'https://xxxx.atlassian.net/wiki/rest/api/content?spaceKey=TESTSPACE&status=current&expand=body.storage.value&type=page'

This curl returns 403. Because this curl is NOT correct. The correct curl should have
-H 'Authorization: Basic xxxxxxxxxxxxxxxxxxxxx \ And the code after Basic should be created by echo -n "{confluence_user_name}:{confluence_api_token}" | base64

If I append above header option, curl works fine. I think you need to create auth code and Authorization header by encoding base64.

Would you please check that?

soras39 commented 8 months ago

Hi @anoopshrma @sayanb

I change the loader code like as follows to check if my solutions work fine.

rest_client.py: before>>> default_headers = { "Content-Type": "application/json", "Accept": "application/json", } after>>> default_headers = { "Content-Type": "application/json", "Accept": "application/json", "Authorization" : "Basic xxxxxxxxxxxxxxxxxxxxxxxxxxxx" }

After I changed the code like above, loader works fine. So I confirm that you need to append "Authorization" header to use with "api_token" (NOT API_KEY BUT API_TOKEN)

Would you please update your loader code ?

skvrd commented 7 months ago

Came accross the same issue.

If you put your username into CONFLUENCE_USERNAME and your api token from here https://id.atlassian.com/manage-profile/security/api-tokens into CONFLUENCE_PASSWORD env variable, the loader will work just fine.

Hope it helps.