Open losil opened 6 months ago
I didn't find useful information in the bundle other than V4 authentication signed header not found: accept-encoding
.
This should be related to the AWS plugin SDK version bumping to v2 in the v1.9.x.
Per my understanding, the AWS SDK v2 uses the v4 accept-encoding by default, so the signatureVersion
was deleted from the AWS plugin configuration.
After the SDK version bumping, we already saw some errors caused by the S3-compatible backend not compatible to the S3 spec.
Not sure whether this is caused by the NetApp Storagegrid
inconsistency with the S3.
After checking the official documentation it seems that they did not implement/support the Accept-Encoding
header.
@blackpiglet do you see any problems when using velero 1.13.x
in combination with velero-plugin-for-aws 1.8.x
?
I haven't tried that, but if your scenario doesn't require the new parameters (tagging and checksumAlgorithm) added in release 1.9, then it should work.
After checking the official documentation it seems that they did not implement/support the
Accept-Encoding
header.
I think that doesn't mean it must. From [3]:
As long as the identity;q=0 or *;q=0 directives do not explicitly forbid the identity value that means no encoding, the server must never return a 406 Not Acceptable error.
Also from [3], why preferred Accepted-Encoding may not be acceptable to the server:
Two common cases lead to this: The data to be sent is already compressed. The server is overloaded and cannot allocate computing resources.
It seems the server should respond rather than silently handle. Does it return 406 Not Acceptable
[1], 415
[2] or the content itself in response? If not, then I'd create issue with the S3 vendor. If yes, then Velero client shouldn't fail.
IMO Velero client may prefer whatever it does, but should accept any (*
) with a > 0 preference value (*;q=0.001
) to cater to those common cases above. Is "any" (original representation) accepted by Velero now, i.e. is qvalues weighting for *
> 0?
I don't know if that would help when/if the server doesn't handle that header (in which case it may be responding with original content, but Velero maybe does not accept it). In that case Velero could still work around that by trying whichever way works (with, or without specific encoding), but arguably the S3 vendor should fix their code to handle the header better and Velero should accept the original representation (if it now does not at this time) as mentioned in [3].
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406 [2] https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/415 [3] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding [4] https://datatracker.ietf.org/doc/html/rfc7231#section-5.3.4
After reading the comments. It seems to me the gap is that the netapp s3 service does not work with aws-sdk-v2.
@losil You may tweak the code and see if it may work with some parameters change when calling the sdk, but I don't think we can make sure the plugin works with EVERY storage which declares it's s3-compatible but indeed may work differently in details from AWS S3.
duped by https://github.com/vmware-tanzu/velero/issues/8152
Long term we should add another plugin that uses an SDK that has the ability to ignore that accept-encoding header like https://github.com/minio/minio-go/blob/99336902dd57f3760e272caf6550e6791eabe0af/pkg/signer/request-signature-v4.go#L60
What steps did you take and what happened: We have updated our velero deployment with the latest Helm chart
6.4.0
which installs velero1.13.2
. With this upgrade the version of thevelero-plugin-for-aws
plugin has also been updated tov1.9.0
respectivelyv1.9.2
during troubleshooting. The upgrade itself went through smoothly. Also the BackupStorageLocation which is a S3-compatible NetApp StorageGrid backend was inAvailable
state after velero was initialized. After that we tested some backup with all were unsuccessful and ended in the stateFailed
. We noticed that that during the backup run the BackupStorageLocation went toUnavailable
with the corresponding log message:The configuration of the BackupStorageLocation looks like this and as said is a S3-compatible NetApp Storagegrid system:
After the Backup run has ended velero marked the BackupStorageLocation as
Available
again in its regularly validation schedule.Downgrading the
velcro-plugin-for-aws
tov1.8.2
solves the issue and the Backups are successful again.What did you expect to happen:
We expect the same behavior when using the current version of the velcro-plugin-for-aws initContainer. Velero should be able to use the S3-compatible backend provided by NetApp Storagegrid.
The following information will help us better understand what's going on:
bundle-2024-05-27-09-20-52.tar.gz
Environment:
velero version
): v1.13.2velero client config get features
):kubectl version
): v1.26.15+rke2r1/etc/os-release
): Ubuntu 22.04.4 LTSVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.