oittaa / gcp-storage-emulator

Local emulator for Google Cloud Storage
BSD 3-Clause "New" or "Revised" License
154 stars 42 forks source link

ValueError: Upload has finished. While uploading chunk by chunk #301

Open devjunhong opened 10 hours ago

devjunhong commented 10 hours ago

Hi oittaa,

Thanks for your continue effort on this project.

Describe the bug

I wanted to try chunk a large file and then upload to the gcp-storage-emulator.

This failed with ValueError. However, the real google cloud storage works fine.

import os
from pathlib import Path
from typing import Generator

from google.cloud import storage

CHUNK_SIZE = 1 * 1024 * 1024  # 1 MB

def chunk_file(file_full_path: str) -> Generator[bytes, None, None]:
    file = Path(file_full_path)
    with file.open("rb") as f:
        while True:
            chunk = f.read(CHUNK_SIZE)
            if not chunk:
                break
            yield chunk

if __name__ == "__main__":
    test_bucket_name = "test-bucket"
    filepath = "train.jsonl"
    blob_name = "train.jsonl"

    os.environ["STORAGE_EMULATOR_HOST"] = "http://gcs:9023"

    client = storage.Client()
    bucket = client.bucket(test_bucket_name)
    blob = bucket.blob(blob_name)

    with blob.open("wb", chunk_size=CHUNK_SIZE) as blob_writer:
        for piece in chunk_file(filepath):
            blob_writer.write(piece)

    for b in bucket.list_blobs():
        print(b.name)
Traceback (most recent call last):
  File "/app/main.py", line 33, in <module>
    blob_writer.write(piece)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py", line 357, in write
    self._upload_chunks_from_buffer(num_chunks)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py", line 417, in _upload_chunks_from_buffer
    upload.transmit_next_chunk(transport, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py", line 503, in transmit_next_chunk
    method, url, payload, headers = self._prepare_request()
                                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/resumable_media/_upload.py", line 611, in _prepare_request
    raise ValueError("Upload has finished.")
ValueError: Upload has finished.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/app/main.py", line 31, in <module>
    with blob.open("wb", chunk_size=CHUNK_SIZE) as blob_writer:
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py", line 437, in close
    self._upload_chunks_from_buffer(1)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py", line 417, in _upload_chunks_from_buffer
    upload.transmit_next_chunk(transport, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py", line 503, in transmit_next_chunk
    method, url, payload, headers = self._prepare_request()
                                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/pypoetry/virtualenvs/pythonproject-9TtSrW0h-py3.11/lib/python3.11/site-packages/google/resumable_media/_upload.py", line 611, in _prepare_request
    raise ValueError("Upload has finished.")
ValueError: Upload has finished.

To Reproduce

To reproduce this error, I had to download a sample file.

Hence, I wrapped the script with docker-compose.

Hopefully, this is helpful to reproduce the issue.

https://github.com/devjunhong/large-file-issue

Expected behavior

It should finish uploading without an error.

System (please complete the following information)

devjunhong commented 10 hours ago

Fortunately, there's a workaround of this issue. If you set the chunk size to be slightly more than the file size, it will work without an issue.

I mean

CHUNK_SIZE = 1 * 1024 * 1024 # 1 MB <- if this is bigger than the file size, it is okay