run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.31k stars 4.66k forks source link

[Question]: ConfluenceReader base url #13783

Open ricardoxu6 opened 1 month ago

ricardoxu6 commented 1 month ago

Question Validation

Question

token = {"access_token": "", "token_type": ""} oauth2_dict = {"client_id": "", "token": token}

base_url = "https://yoursite.atlassian.com/wiki"

To use Confluence Reader to read pages on Confluence, it needs us to specify a base URL and the URL should be in the typical format of Atlassian Cloud, which ends with /wiki. So, if my Confluence is hosted on a custom domain, for instance, https://confluence.xxx.com/, is there any other way for us to use Confluence Reader?

dosubot[bot] commented 1 month ago

Hello @ricardoxu6! I'm here to help you with any bugs, questions, or contributions you may have. Let's tackle this ConfluenceReader base url issue together.

Yes, you can use ConfluenceReader with a base URL that is hosted on a custom domain. The base_url parameter in the ConfluenceReader class is designed to accept any valid Confluence instance URL, including those hosted on custom domains.

Here is the relevant part of the code that shows how the base_url is used:

class ConfluenceReader(BaseReader):
    """Confluence reader.

    Reads a set of confluence pages given a space key and optionally a list of page ids

    Args:
        oauth2 (dict): Atlassian OAuth 2.0, minimum fields are `client_id` and `token`, where `token` is a dict and must at least contain "access_token" and "token_type".
        base_url (str): 'base_url' for confluence cloud instance, this is suffixed with '/wiki', eg 'https://yoursite.atlassian.com/wiki'
        cloud (bool): connecting to Confluence Cloud or self-hosted instance
    """

    def __init__(
        self, base_url: str = None, oauth2: Optional[Dict] = None, cloud: bool = True
    ) -> None:
        if base_url is None:
            raise ValueError("Must provide `base_url`")

        self.base_url = base_url

        try:
            from atlassian import Confluence
        except ImportError:
            raise ImportError(
                "`atlassian` package not found, please run `pip install"
                " atlassian-python-api`"
            )
        self.confluence: Confluence = None
        if oauth2:
            self.confluence = Confluence(url=base_url, oauth2=oauth2, cloud=cloud)
        else:
            api_token = os.getenv(CONFLUENCE_API_TOKEN)
            if api_token is not None:
                self.confluence = Confluence(url=base_url, token=api_token, cloud=cloud)
            else:
                user_name = os.getenv(CONFLUENCE_USERNAME)
                if user_name is None:
                    raise ValueError(
                        "Must set environment variable `CONFLUENCE_USERNAME` if oauth,"
                        " oauth2, or `CONFLUENCE_API_TOKEN` are not provided."
                    )
                password = os.getenv(CONFLUENCE_PASSWORD)
                if password is None:
                    raise ValueError(
                        "Must set environment variable `CONFLUENCE_PASSWORD` if oauth,"
                        " oauth2, or `CONFLUENCE_API_TOKEN` are not provided."
                    )
                self.confluence = Confluence(
                    url=base_url, username=user_name, password=password, cloud=cloud
                )

As long as you provide a valid base_url for your custom domain, the ConfluenceReader should work correctly [1].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

ricardoxu6 commented 1 month ago

What if for my custom domain, those confluence pages' URLs do not have a page_id included in the url, can I still use the Confluence Reader

dosubot[bot] commented 1 month ago

Yes, you can use the ConfluenceReader class to read Confluence pages on a custom domain even if the URLs do not include a page_id. The load_data method allows you to specify Confluence pages using one of four mutually exclusive methods: space_key, page_ids, label, or cql (Confluence Query Language). Therefore, you can use space_key, label, or cql to read pages without needing to specify page_ids [1].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.