googleapis / google-cloud-php

Google Cloud Client Library for PHP
https://cloud.google.com/php/docs/reference
Apache License 2.0
1.09k stars 436 forks source link

storage: sometimes ResumableUploader crashes on resume() #7088

Closed oprudkyi closed 8 months ago

oprudkyi commented 8 months ago

Environment details


#### Steps to reproduce

use resumable uploader (below sample code) and sometimes (not often) I got this

PHP Notice: TypeError: GuzzleHttp\Psr7\LimitStream::__construct(): Argument #3 ($offset) must be of type int, null given, called in /var/www/vendor/google/cloud-core/src/Upload/ResumableUploader.php on line 158 and defined in /var/www/vendor/guzzlehttp/psr7/src/LimitStream.php:32 Stack trace:

0 /var/www/vendor/google/cloud-core/src/Upload/ResumableUploader.php(158): GuzzleHttp\Psr7\LimitStream->__construct(Object(GuzzleHttp\Psr7\Stream), -1, NULL)

1 /var/www/vendor/google/cloud-core/src/Upload/ResumableUploader.php(138): Google\Cloud\Core\Upload\ResumableUploader->upload()

2 /var/www/app/Services/upload.php(1): Google\Cloud\Core\Upload\ResumableUploader->resume('https://storage.googleapis.com/upload/storage/v1/b/bucket/o?uploadType=resumable&...')


I suspect (not sure though) that 
- code in https://github.com/googleapis/google-cloud-php/blob/ee748ddff7a44e5c1050363096a6149e4cad7d73/Core/src/Upload/ResumableUploader.php#L136C9-L137C1

$this->rangeStart = $this->getRangeStart($response->getHeaderLine('Range'));

may return null, and then upload() called with null on https://github.com/googleapis/google-cloud-php/blob/ee748ddff7a44e5c1050363096a6149e4cad7d73/Core/src/Upload/ResumableUploader.php#L158 
        $data = new LimitStream(
            $this->data,
            $this->chunkSize ?: - 1,
            $rangeStart
        );

#### Code example

```php
            $gcsClient = new StorageClient([
                'projectId' => 'project',
                'requestTimeout' => 30,
            ]);
            $gcsBucket = $gcsClient->bucket('bucket');

            $uploader = $gcsBucket->getResumableUploader("data", [
                'name' => "some_path",
            ]);

            try {
                $objectData = $uploader->upload();
            } catch (GoogleException $ex) {
                if (stripos($ex->getMessage(), 'Upload failed. Please use this URI to resume your upload:') === false) {
                    throw $ex;
                }
                $this->info("Network error while uploading file to bucket, trying to reconnect and continue uploading");
                $stop = false;
                $count = 0;
                while (!$stop) {
                    ++$count;
                    if ($count >= 100) {
                        throw new Exception('Upload file to bucket failed after 100 tries');
                    }
                    try {
                        $resumeUri = $uploader->getResumeUri();
                        $objectData = $uploader->resume($resumeUri);
                        $stop = true;
                    } catch (GoogleException $ex) {
                        if (stripos($ex->getMessage(), 'Upload failed. Please use this URI to resume your upload:') === false) {
                            throw $ex;
                        }
                        // ignore error and continue
                        sleep(5);
                    }
                }
vishwarajanand commented 8 months ago

Hi @oprudkyi, I cannot reproduce this issue on my end. I can ensure that I am calling resume() but my uploads resume and go without any pain.

It seems rangeStart is getting its value from a PUT call to the resume URI so can't really do much on the library unless I validate that getStatusResponse can generate a response without (or a null) Range header. If this issue persists maybe you can instrument the library (maybe print the resumeUri, Content-Range passed in getStatusResponse and the value of Range header returned).

However, I do see a guidance where HTTP PUT's Content-Range header should have total bytes rather than a */*, so I am updating that.

oprudkyi commented 8 months ago

Hi @vishwarajanand
it is very transient issue. I have seen it two times only on project with thousands uploads per day it might be that it is only the case when something broken on api side , i.e. no header at all by example