googleapis / python-storage

Apache License 2.0
441 stars 152 forks source link

Blob client instance can not be reused to fetch new data on blob #1367

Closed saikonen closed 2 hours ago

saikonen commented 2 hours ago

Possible bug with the blob client when trying to reuse a client instance to tail blob content.

Environment details

Steps to reproduce

  1. use an instance of the blob client with download_as_bytes
  2. wait for fetched blob to increase in size
  3. use the same blob client instance and try download_as_bytes again. Observe that no new bytes are received.

Code example

from google.cloud import storage

BUCKET=""
BLOB = ""
PROJECT=""

def main():
    client = storage.Client(project=PROJECT)
    bucket = client.bucket(BUCKET)
    blob = bucket.blob(BLOB)

    contents = b"starting logline\n"    
    with blob.open("wb") as f:
        f.write(contents)

    # All good here
    print("first fetch")
    fetched = blob.download_as_bytes()
    assert contents == fetched, f"log content mismatch\nGot:\n{fetched}\nExpected:\n{contents}"

    contents += f"add line\n".encode("utf-8")
    with blob.open("wb") as f:
        f.write(contents)

    # Not okay anymore
    print("failing second fetch")
    fetched = blob.download_as_bytes()
    assert contents != fetched, "Expected mismatch in fetched content."

    # Works with a fresh Blob client instance
    print("fixed second fetch")
    second_blob = bucket.blob(BLOB)
    fetched = second_blob.download_as_bytes()
    assert contents == fetched, f"log content mismatch\nGot:\n{fetched}\nExpected:\n{contents}"

if __name__=="__main__":
    main()
andrewsg commented 2 hours ago

Hi, thanks for your feedback. This is the result of a system that automatically sets the "generation" on the Blob object after a successful operation that returns a generation from the server. The reason for the system is to protect against race conditions involving Blobs being updated in the middle of serial operations. It does cause inconveniences in these cases but because race conditions would otherwise be very common, we consider it to be a necessary feature.

In order to avoid it, use methods with the blob name instead of the Blob object itself, or create new Blob objects.