googleapis / google-cloud-php

Google Cloud Client Library for PHP
https://cloud.google.com/php/docs/reference
Apache License 2.0
1.09k stars 436 forks source link

Uploading a stream resource's contents to a bucket results in the resource being closed #7343

Closed leonboot closed 4 months ago

leonboot commented 5 months ago

I'm trying to upload a stream's contents directly to a bucket using the StorageClient->bucket('bucket_name')->upload() method. The stream is a resource created by the ssh2_exec() command (I want to stream the result of an SSH command directly to a file in a GCS bucket).

I need to determine the exit code of the executed command to determine whether the command was ran successfully. This can be done by calling the stream_get_meta_data() function on the stream resource. However, this results in an error after the stream was read by the StorageClient->bucket('bucket_name')->upload() method. Running is_resource() on the stream resource before and after passing it to the upload() command results in a true and false, respectively. This leads me to believe that the upload() methods, or any of the methods called by this method, closes the stream once it is finished reading it. This leaves me unable to read the stream's metadata. I believe this is undesired.

Here's a script to reproduce the issue:


use Google\Cloud\Storage\StorageClient;

function debug($stream) {
    var_dump(is_resource($stream));
    var_dump(get_resource_type($stream));
    var_dump(stream_get_meta_data($stream));
}

$connection = ssh2_connect('my-ssh-host', 22);
# another ssh2_auth_* method may be used
if (!ssh2_auth_pubkey_file($connection, 'ssh-user', 'id_ed25519.pub', 'id_ed25519, 'secret-key-passphrase')) {
    die('Authentication failed');
}
$stream = ssh2_exec($connection, 'ls -al && exit 2'); # gets a simple directory listing and then exit with statuscode 2
if ($stream === false) {
    die('Unable to execute command');
}
stream_set_blocking($stream, true);

$client = new StorageClient(['keyFilePath' => __DIR__.'/google-cloud-credentials.json']);

$bucket = $client->bucket('my-gcs-storage-bucket');

debug($stream); # dump details about the stream resource

echo "Uploading...".PHP_EOL;
$object = $bucket->upload($stream, [
    'name' => 'ssh-output.txt',
    'validate' => false,
]);

debug($stream); # dump details about the stream resource after it's been consumed

The above script will result in the following output:

bool(true)
string(6) "stream"
array(8) {
  ["exit_status"]=>
  int(0)
  ["timed_out"]=>
  bool(false)
  ["blocked"]=>
  bool(true)
  ["eof"]=>
  bool(false)
  ["stream_type"]=>
  string(12) "SSH2 Channel"
  ["mode"]=>
  string(2) "r+"
  ["unread_bytes"]=>
  int(0)
  ["seekable"]=>
  bool(false)
}
Uploading...
bool(false)
string(7) "Unknown"

Fatal error: Uncaught TypeError: stream_get_meta_data(): supplied resource is not a valid stream resource in /app/test.php:10
Stack trace:
#0 /app/test.php(10): stream_get_meta_data(Resource id #17)
#1 /app/test.php(37): debug(Resource id #17)
#2 {main}
  thrown in /app/test.php on line 10

If the above test script is changed so that the upload() call is replaced by the following code:

$count = 0;
while ($line = fgets($stream)) {
    $count += strlen($line);
}
echo "Read $count bytes".PHP_EOL;

Then the output is as follows:

bool(true)
string(6) "stream"
array(8) {
  ["exit_status"]=>
  int(0)
  ["timed_out"]=>
  bool(false)
  ["blocked"]=>
  bool(true)
  ["eof"]=>
  bool(false)
  ["stream_type"]=>
  string(12) "SSH2 Channel"
  ["mode"]=>
  string(2) "r+"
  ["unread_bytes"]=>
  int(0)
  ["seekable"]=>
  bool(false)
}
Read 1561 bytes
bool(true)
string(6) "stream"
array(8) {
  ["exit_status"]=>
  int(2)
  ["timed_out"]=>
  bool(false)
  ["blocked"]=>
  bool(true)
  ["eof"]=>
  bool(true)
  ["stream_type"]=>
  string(12) "SSH2 Channel"
  ["mode"]=>
  string(2) "r+"
  ["unread_bytes"]=>
  int(0)
  ["seekable"]=>
  bool(false)
}

In other words, just consuming the stream by calling fgets() on it until EOF is reached keeps the resource intact and its metadata can still be read. When calling $bucket->upload() on it seems to close the resource before the metadata can be read.

vishwarajanand commented 4 months ago

Hi @leonboot , I am unable to get the setup working on my end. I used a test ssh server from https://sdf.org (repro script) but my ssh_exec commands hangs indefinitely. Wondering whether you could face this issue too!

On the GCS library's end, we just do the following operations on the stream:

  1. GuzzleHttp\Psr7\Utils::streamFor($stream)
  2. Wrap the above in a GuzzleHttp\Psr7\MultipartStream and send over to network wire via GuzzleHttp\Psr7\Request.

So, ssh streams are non-seekable (see stream_get_meta_data responds ["seekable"]=> bool(false)). Because the GCS lib has read the stream & saved it, you can either fetch it back OR wrap it into a different variable to re-read it (untested):

$seekable_stream = fopen('php://memory','r+');
fwrite($seekable_stream, $stream);
rewind($seekable_stream);

Also, I see that you've enabled stream_set_blocking so you might be better off by upload the string output of SSH commands to GCS instead (consider using phpseclib3\Net\SSH2->exec(...)). I would consider this as a more reliable workaround unless your use-case really needs to use streaming ssh output.

vishwarajanand commented 4 months ago

Closing due to no response.