vgrem / Office365-REST-Python-Client

Microsoft 365 & Microsoft Graph Library for Python
MIT License
1.34k stars 334 forks source link

Uploading large file from io.BytesIO instance results in `io.UnsupportedOperation: fileno` issue #793

Open ChaddRobertson opened 11 months ago

ChaddRobertson commented 11 months ago

I have some relatively large CSV files that I am uploading to SharePoint via an io.BytesIO instance using the following simplified method:

def write_file_bytes(self, relative_url: str, file_name: str, file_bytes: bytes) -> None:

        folder: Folder = self.client_context.web.get_folder_by_server_relative_url(relative_url) 

        chunk_size: int = 1024 * 1024 * 15

        # File bytes to IO stream
        file_bytes: io.BytesIO = io.BytesIO(file_bytes)

        folder.files.create_upload_session(file_bytes, chunk_size=chunk_size, file_name=file_name).execute_query()

Based on this StackOverflow question, writing the file from a io.BytesIO is indeed possible, but the file_name and file_size should be passed as keyword arguments to chunk_uploaded. However, even in specifying a callback that takes the file size as an argument, I still get an io.UnsupportedOperation: fileno exception.

Uploading the file from either a byte array or an io.BytesIO instance is necessary due to the nature of what I am doing.

Any help would be appreciated. Thanks.

shwetasah-cape commented 6 months ago

Hi @vgrem, bumping this issue, even I am facing this when I try to upload large in memory file content. I think it's this line trying to get file info that causes the fileno error. As it stands, unable to upload large files using file.upload_large().

ChaddRobertson commented 6 months ago

@shwetasah-cape I was actually able to solve this issue. I'll need to check the project for my implementation, but I'll update this comment tomorrow morning when I have some time (approximately 12 hours from now). But yes, from what I can remember, os.fstat() was the origin of the exception.

Will let you know soon.

ChaddRobertson commented 6 months ago

Hi @shwetasah-cape, this was the solution that I came up with a while ago. Perhaps it assists. Unfortunately, I can't remember the exact nature of the issue I was having, or why this solves it. However, it ended up working quite well in the end.

Note that this deletes files of the same name as the one being uploaded. Might not be what you want, but you can just remove that logic.

def write_file_bytes(self, relative_url: str, file_name: str, file_bytes: bytes) -> None:

    folder: Folder = self.client_context.web.get_folder_by_server_relative_url(relative_url) 

    chunk_size: int = 2048000

    # Check if the file already exists
    file: File = folder.files.get_by_url(file_name)

    if file.exists:

        # If the file exists, delete it
        file.delete_object().execute_query()

    with tempfile.NamedTemporaryFile(delete=False) as temp_file:
        temp_file.write(file_bytes)

    with open(temp_file.name, "rb") as file_to_upload:
        folder.files.create_upload_session(
            file=file_to_upload,
            chunk_size=chunk_size,
            file_name=file_name
        ).execute_query()
shwetasah-cape commented 6 months ago

@ChaddRobertson Thank you for responding! Yes, I ended up doing this workaround and it works well by creating a temp file.

b4rlw commented 3 months ago

Faced this as well - looks like

file_size = os.fstat(file.fileno()).st_size

is always called in create_upload_session, which if I'm not mistaken requires a file descriptor (which in turn requires an actual file opened in the file system).

We would need some kind of override for in memory files:

size = bytes_io.getbuffer().nbytes