vgrem / Office365-REST-Python-Client

Microsoft 365 & Microsoft Graph Library for Python
MIT License
1.3k stars 332 forks source link

SPQueryThrottledException from file upload #726

Open JWBWork opened 1 year ago

JWBWork commented 1 year ago

We're working on an automation that's uploading a lot of files to SharePoint through the office365 library in python

We've started getting this exception - from what I've been able to find it looks like it's related to a query returning over 5000 items. I'm confused why I'm getting this response when I'm trying to upload a file, I assume some query happens in order to perform the upload

[2023-08-15T14:29:48.840]: office365.runtime.client_request_exception.ClientRequestException: ('-2147024860, Microsoft.SharePoint.SPQueryThrottledException', 'The attempted operation is prohibited because it exceeds the list view threshold.', "500 Server Error: Internal Server Error for url: https://mysharepointsite.sharepoint.com/sites/Podio/_api/Web/RootFolder/Folders('Shared%20Documents')/Folders('Folder%20Name')/Folders('Another%20Folder')/Folders('SubFolder')/Folders('Longer%20Folder%20%20%201234568%20%20%20Name%20%20987654321')/Folders('Files')/Files/add(overwrite=true,url='image.jpg')")

Here is a minimal example of how we upload the files

from office365.sharepoint.folders.folder import Folder
from office365.runtime.http.request_options import RequestOptions
from office365.sharepoint.client_context import ClientContext

def upload_file(file_name: str, folder: Folder, client_context: ClientContext) -> str:
    def _set_header(request: RequestOptions) -> None:
        request.set_header("Prefer", "bypass-shared-lock")

    with open(file_name, "rb") as content_file:
        file_content = content_file.read()

    upload_file = folder.upload_file(file_name, file_content)
    client_context.load(upload_file, before_loaded=_set_header)
    client_context.execute_query()
    return upload_file.serverRelativeUrl

Anyone know how we can fix this? I see now way to bump the list view threshold in the settings and I tried creating an indexed column for the Title column (thinking that might be the value used in the query) but we're still hitting this exception

JWBWork commented 1 year ago

For anyone who faces this issue we mitigated it by uploading to a different directory

  1. Catch ClientRequestException and confirm it's this exact issue
  2. Parsed the attempted upload path to identify the parent directory of the directory we're trying to upload a file to
  3. Append an integer to that file name
  4. Check for that directory and upload if it doesn't exist
  5. Upload to that directory instead