Closed hohaichi closed 7 years ago
Hello @hohaichi!
gcloud-python
handles the base64 encoding before the request as a convenience.
Which errors are you referring too? It would be good to make those as clear as possible.
@tseaver & @dhermes do you guys have anything to add?
Hi @daspecster,
This is the error:
with open('/usr/local/google/home/hohaichi/helloworld.txt', 'rb') as f:... blob.upload_from_file(f, encryption_key='0QZa2d3ZYTghqnSn1Ca1OdSQXaN/tujOus7nVbxo6nE=') ... Traceback (most recent call last): File "
", line 2, in File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/blob.py", line 539, in upload_from_file self._check_response_error(request, http_response) File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/blob.py", line 401, in _check_response_error error_info=request.url) gcloud.exceptions.BadRequest: 400 Missing an encryption key, or it is not base64 encoded, or it does not meet the required length of the encryption algorithm. (https://www.googleapis.com/upload/storage/v1/b/chi-gcloud-trial/o?uploadType=media&name=foo)
Per the link you mentioned before, that is the expected error.
You will receive an HTTP 400 error in the following cases:
- You upload an object using a customer-supplied encryption key, and you attempt to perform another operation on the object (other than requesting or updating metadata) without providing the key.
- You upload an object using a customer-supplied encryption key, and you attempt to perform another operation on the object with an incorrect key.
- You upload an object without providing a customer-supplied encryption key, and you attempt to perform another operation on the object with a customer-supplied encryption key.
- You specify an encryption algorithm, key, or SHA256 hash that is not valid.
https://cloud.google.com/storage/docs/encryption#response
Unless I'm missing something?
As you pointed out by the link, the error comes directly from the underneath API, which expects a base64 string encoding 32 bytes. It is confusing when gcloud doesn't expect a base64 string but complains that the input is not base64.
That said, IMO consistency between gcloud and API would make it easier to use and move between different libraries. It is not confusing that GCloud accepts raw bytes for convenience. But it is confusing that GCloud accepts strings as well, but expect a different format.
Ok, I think I see now. So this error is actually a bug as well.
Two things actually.
Yes, but for (1), I guess it still accepted the base64 string, encoded it again and uploaded. The problem in this case is the length. GCloud expects a string of 32 characters.
In general, strings would go with encodings. When reading GCloud doc (http://googlecloudplatform.github.io/gcloud-python/stable/storage-blobs.html), I was wondering if the encryption_key string should be ASCII, hex, or unicode (and which one?). If my guess is right, GCloud expects ASCII (which may have problem with non-printable characters), although the examples look like hex ('aa426195405adee2c8081bb9e7e74b19').
In this example, it shows using just a string of 32 characters as the encryption key and not a hex()
value.
https://googlecloudplatform.github.io/gcloud-python/stable/storage-blobs.html#gcloud.storage.blob.Blob.upload_from_file
We could change it somewhat to make it more clear.
>>> client = storage.Client(project='my-project')
>>> bucket = client.get_bucket('my-bucket')
>>> encryption_key = '####my 32 byte encryption key###'
>>> blob = Blob('secure-data', bucket)
Sorry, I meant 'aa426195405adee2c8081bb9e7e74b19' in the examples look like a hex number. :)
Do we have precedent here for whether we should do the base64 encoding or not ? I can see the argument that we shouldn't if it means that people might upload using the CLI or the API directly and then want to access via gcloud-python only to find out that the key doesn't seem to be valid...
Is there a way that we can make it explicit that when you're passing in base64 vs non-base64?
ie:
blob = Blob(... encryption_key='raw encryption key base64encoded')
# or
blob = Blob(... encryption_key=client.EncryptionKey('non-base64 encoded'))
?
I guess this comes down to the big question: are we supporting a "password" ? or an opaque key? If a password, then we should regular bytes (str) and base64 encode it for convenience. If we're accepting an opaque key (akin to say, a bitcoin private key....) then base64 seems like the right way to accept data, and we should pass it along as is.
Thoughs?
ISTM that library users should not base64-encoding the encryption key: that is a requirement of the transport (raw-bytes-over-JSON), not the API surface. We should just expect users to give us exactly 32 bytes of whatever, and we handle passing it to the server correctly. E.g., if in the future we had a gRPC variant for CloudStorage, we would not be base64-encoding the key before assigning it to the protobuf field for the key.
I don't know if "password" keys or opaque keys will be more popular. And honestly, I did find that the gcloud-python library is very easy to use as it is. My concerns are on consistency though:
And a minor point: the example key could be made clearer as bytes than a string of hex digits.
if in the future we had a gRPC variant for CloudStorage, we would not be base64-encoding the key before assigning it to the protobuf field for the key.
I'd argue that the API surface would still want to accept the base64 encoded version, which we then decode and send as raw bytes in the RPC call...
I suppose we could sniff at the supplied key: if it is of type bytes
and its length is exactly 32, then we base64-encode it before jamming it into the headers (ught!); otherwise, we expect that it is already base64-encoded and just jam it in directly.
Detecting if it is, in fact base64 could be tricky and not super consistent and I think it would be frustrating to have false positives, especially when it comes to security.
What if we had a flag to denote if the key is already base64 encoded? Downside is that having a flag might force users to understand more layers in the system potentially.
blob = Blob(... encryption_key='non-base64 encoded')
And then either
blob = Blob(... encryption_key_b64='bm9uLWJhc2U2NCBlbmNvZGVk')
or
blob = Blob(... encryption_key='bm9uLWJhc2U2NCBlbmNvZGVk', b64encoded=True)
I don't think we have to detect that the user has passed us base64-encoded bytes. If the passed value is 32 bytes exactly, it can't be a valid base64-encoded key: those have to be 42-44 bytes long. So in that case, we can just base64-encode it and go on. For any other case, we just assume the user did it already, and pass it through untouched: if it happens that they somehow passed a 33-byte random byte array, the back-end will blow up. E.g.:
if isinstance(encryption_key, bytes) and len(encryption_key) == 32:
encryption_key = base64.b64encode(encryption_key)
Hi,
Google Cloud Storage accepts a customer-supplied encryption key as a base64-encoded string in JSON and XML APIs (https://cloud.google.com/storage/docs/encryption#customer-supplied). Why does GCloud work differently, taking either raw bytes or raw strings? I found it confusing. Also, the error messages do not match the error in this case.
Thanks for taking a look.