run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.64k stars 5.25k forks source link

Google reader and Notion reader are not working #3777

Closed Ashish5869 closed 4 months ago

Ashish5869 commented 1 year ago

Google docs is working but google drive reader is not working

from llama_index import download_loader

GoogleDriveReader = download_loader('GoogleDriveReader')

loader = GoogleDriveReader()
documents = loader.load_data(file_ids=['file_id'])

OUTPUT:
TypeError: GoogleDriveReader._load_from_file_ids() takes 2 positional arguments but 3 were given

Notion reader is also giving an error.

from llama_index import GPTListIndex, NotionPageReader
from IPython.display import Markdown, display
import os
integration_token = 'notion_integration_token'
page_ids = ["page_id"]
notion_reader = NotionPageReader(integration_token=integration_token)
documents = notion_reader.read_page(page_id=page_ids)

OUTPUT:
~/.local/lib/python3.10/site-packages/llama_index/readers/notion.py in _read_block(self, block_id, num_tabs)
     58             data = res.json()
     59 
---> 60             for result in data["results"]:
     61                 result_type = result["type"]
     62                 result_obj = result[result_type]

KeyError: 'results'

How to resolve this error?

livelikeabel commented 1 year ago

You can check this PR https://github.com/jerryjliu/llama_index/pull/6721

livelikeabel commented 1 year ago

1. Did you get token from here?

image

2. Than you should add connection with your own token on your notion page

image
abhinav-adtechs commented 1 year ago

After adding connection, still it doesn't work. Getting the same error.

KeyError Traceback (most recent call last) Cell In[5], line 3 1 integration_token = os.getenv("NOTION_INTEGRATION_TOKEN") 2 page_ids = ["All-In-Capital-512240051c104b048b3f768c71a2709b"] ----> 3 documents = NotionPageReader(integration_token=integration_token).load_data( 4 page_ids=page_ids 5 )

File /opt/homebrew/lib/python3.9/site-packages/llama_index/readers/notion.py:161, in NotionPageReader.load_data(self, page_ids, database_id) 159 else: 160 for page_id in page_ids: --> 161 page_text = self.read_page(page_id) 162 docs.append(Document(text=page_text, metadata={"page_id": page_id})) 164 return docs

File /opt/homebrew/lib/python3.9/site-packages/llama_index/readers/notion.py:95, in NotionPageReader.read_page(self, page_id) 93 def read_page(self, page_id: str) -> str: 94 """Read a page.""" ---> 95 return self._read_block(page_id)

File /opt/homebrew/lib/python3.9/site-packages/llama_index/readers/notion.py:60, in NotionPageReader._read_block(self, block_id, num_tabs) 55 res = requests.request( 56 "GET", block_url, headers=self.headers, json=query_dict 57 ) ... ---> 60 for result in data["results"]: 61 result_type = result["type"] 62 result_obj = result[result_type]

KeyError: 'results'

Ashish5869 commented 1 year ago

@livelikeabel Notion is working now but google drive is giving a token refresh error.

Google docs is working

from llama_index import GoogleDocsReader

document_ids = ["google_doc_id"]
documents = GoogleDocsReader().load_data(document_ids=document_ids)

Google drive

from llama_index import download_loader

GoogleDriveReader = download_loader('GoogleDriveReader')
loader = GoogleDriveReader()
documents = loader.load_data(folder_id=None,file_ids=['google_doc_id'], mime_types=['application/vnd.google-apps.document'])

OUTPUT ERROR:

. . .

~/anaconda3/lib/python3.10/site-packages/oauth2client/client.py in _do_refresh_request(self, http)
    820                 pass
--> 821             raise HttpAccessTokenRefreshError(error_msg, status=resp.status)
    822 

HttpAccessTokenRefreshError: invalid_grant

During handling of the above exception, another exception occurred:

. . .

~/anaconda3/lib/python3.10/site-packages/pydrive/auth.py in Refresh(self)
    475       self.credentials.refresh(self.http)
    476     except AccessTokenRefreshError as error:
--> 477       raise RefreshError('Access token refresh failed: %s' % error)
    478 
    479   def GetAuthUrl(self):

RefreshError: Access token refresh failed: invalid_grant

I am using the same credentials for google docs and google drive

sidhartha-roy commented 1 year ago

Adding the connection worked for me

dosubot[bot] commented 1 year ago

Hi, @Ashish5869! I'm Dosu, and I'm helping the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue is related to the Google Drive Reader and Notion Reader in the llama_index library not working properly. The Google Drive Reader is giving a TypeError and the Notion Reader is giving a KeyError. You mentioned that you received a suggestion to check a pull request for help, which indicates that there may be a solution available. Another user also suggested adding a connection with a token for the Notion Reader, which could potentially resolve the KeyError issue. However, it seems that the error still persists for the Google Drive Reader, specifically with a token refresh error.

If this issue is still relevant to the latest version of the LlamaIndex repository, please let the LlamaIndex team know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days. Thank you for your understanding and contribution to the LlamaIndex project!

amindadgar commented 1 year ago

I have the same problem but just for the NotionPageReader. Followed the instructions by "livelikeabel" but still getting the KeyError: 'results'. I did change the capabilities of the integration in notion setting but not working. @sidhartha-roy Could you specify which capabilities your integration had, that it worked for you?

dosubot[bot] commented 1 year ago

@Disiok Could you please help @amindadgar with the issue related to the NotionPageReader in the LlamaIndex library? They are experiencing a KeyError: 'results' even after following the instructions provided by another user. It seems that changing the capabilities of the integration in Notion settings did not resolve the issue. Thank you for your assistance!

Rahmat711 commented 11 months ago

@livelikeabel can you please help how you solved the notion issue ?

sebi5000 commented 10 months ago

Hi, for me it worked after the steps above: adding the Connection. But one important hint: Only use the "hash" as page-id. In my example the page-id is exported as following:

https://www.notion.so/SuccessFactors-To-P-I-Loga-Replication-74c9b1046e614e78870b23f8f6b4df6d?pvs=4

So the page-id to set for the notion_loader is only "74c9b1046e614e78870b23f8f6b4df6d". I get the same error as you did, if I used it with Page Title included.

regards

drewzeee commented 8 months ago

Thank you @sebi5000. This got me functional.

dosubot[bot] commented 8 months ago

The KeyError: 'results' you're encountering in the NotionPageReader class from the LlamaIndex library suggests that the response from the Notion API does not contain a 'results' key. This error typically occurs when the API response structure is different from what is expected or when the request fails for some reason (e.g., due to incorrect permissions, invalid request parameters, or an outage on the Notion API side).

Given the context and the provided code, here are a few steps to resolve the issue:

  1. Check API Response Status Code: Before accessing the data['results'], ensure that the request was successful by checking the response's status code. A successful request to the Notion API should return a status code of 200. If the status code indicates an error (e.g., 400, 401, 403, 404, etc.), it can help identify the issue.

  2. Error Handling: Implement error handling to catch cases where the 'results' key might be missing in the response. This can be done by using the .get() method, which returns None if the specified key does not exist, instead of directly accessing the key with data['results'].

  3. Logging: Add logging to print out the API response. This can help you understand the structure of the response and why the 'results' key might be missing. It's possible that the API is returning an error message or a different structure than expected.

Here's an example of how you might modify the _read_block method to include these suggestions:

def _read_block(self, block_id: str, num_tabs: int = 0) -> str:
    """Read a block."""
    done = False
    result_lines_arr = []
    cur_block_id = block_id
    while not done:
        block_url = BLOCK_CHILD_URL_TMPL.format(block_id=cur_block_id)
        query_dict: Dict[str, Any] = {}

        res = requests.request(
            "GET", block_url, headers=self.headers, json=query_dict
        )
        if res.status_code != 200:
            print(f"Failed to read block with status code: {res.status_code}")
            print("Response:", res.text)
            break  # or handle the error as appropriate

        data = res.json()

        # Use .get() to avoid KeyError if 'results' key is missing
        for result in data.get("results", []):
            # Existing processing logic here...

        if data.get("next_cursor") is None:
            done = True
        else:
            cur_block_id = data["next_cursor"]

    return "\n".join(result_lines_arr)

By implementing these changes, you can avoid the KeyError and gain insight into why the error might be occurring, which can help in further troubleshooting and resolving the issue.

For more information, you can refer to the NotionPageReader class in the LlamaIndex library.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.