Closed nicolas2bert closed 1 year ago
My role is to assist you with the merge of this
pull request. Please type @bert-e help
to get information
on this process, or consult the user documentation.
Status report is not available.
The Fix Version/s
in issue BB-425 contains:
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
7.10.11
7.70.2
8.5.5
8.6.25
8.7.0
Please check the Fix Version/s
of BB-425, or the target
branch of this pull request.
The "base62Decode" function (Arsenal/lib/versioning/VersionID.ts at development/7.10 · scality/Arsenal ) could be quite expensive in terms of CPU time and memory usage. There are a few reasons for this:
The function makes three calls to the base62Integer.decode() and base62String.decode() functions. These functions are themselves relatively expensive, so making three calls to them adds up.
The function has to iterate through the string to find the start of each encoded value. This is also an expensive operation.
The function has to slice the string three times. This is another relatively expensive operation.
All of these factors add up to make this function relatively expensive.
@rahulreddy Is there any advantage to using version ID over creation date?
The only disadvantage, I see, of using the creation date is that two versions could be created at the same time. In that case, we could decode the version ID to decide.
@rahulreddy Is there any advantage to using version ID over creation date?
The only disadvantage, I see, of using the creation date is that two versions could be created at the same time. In that case, we could decode the version ID to decide.
Yup, my main concern is the collision. You will have two versions created at the same ms, we already see at at a lot of deployments today. I would add another layer of entropy to be unique, may be you decode version ids and compare only for this case.
Yes, I agree. This condition is handled in the PR.
AFAICT, we also rely on ordering of keys in MongoClientInterface (and possibly metadata ?) to compute the last "version" to use as master... Does it mean we will need to do the same kind of fix in Arsenal : https://github.com/scality/Arsenal/blob/bdb59a0e63c20578078bd169e70c78a0c49f6e60/lib/storage/metadata/mongoclient/MongoClientInterface.js#L1055 ?
do correct me if I am wrong, short version ID is only used at the encoding/decoding step in cloudserver, and the generateVersionID
(https://github.com/scality/Arsenal/blob/bdb59a0e63c20578078bd169e70c78a0c49f6e60/lib/versioning/VersionID.ts#L100) is still timestamp based, so list order of keys should still be in some chronological order; the description may be a bit misleading attributing the issue to shortVersionID.
this issue would then look like an edge-case (race condition?) scenario that will also affect the long version ID and would likely cause issue to the getLatestVersion
logic pointed out by Francois
Internal medatada methods (server side) use decoded version id. For decoded version ids, the order is preserved.
However, the lifecycle "merging and sorting" (client side) method used base62 encoded version id to sort the listing (with merging versions and delete markers) that does not seem to preserve the order (based on creation date). Cf PR description.
I am not sure to understand your concerns @francoisferrand @alexanderchan-scality .
@nicolas2bert i see. I had confused the versionID comparison to be with the internal version id; then I do agree with you that the base62 encode does lose the lexigraphic order property needed for the comparison.
In that case, the find latest version would not be affected since that uses the internal VID for comparison.
Internal medatada methods (server side) use decoded version id. For decoded version ids, the order is preserved.
thanks @nicolas2bert , that is the part I missed, and got me worried of similar issue with arsenal.
s3 listObjectVersions is supposed to "request returns objects in the order that they were stored, returning the most recently stored object first". I do not see any mention (in the doc) that this order covers both versions & delete markers, but it seems to be consistent with the above statement and the exemples there...
So I wonder why we do this sorting in lifecycle, and if we should not implement this in cloudserver/arsenal instead? (This may avoid issues in the UI as well)
@francoisferrand The listObjectVersions API returns versions and delete markers in separate arrays. To determine the object's "stale date," the arrays must be merged and sorted. With the introduction of lifecycle optimization listings, this logic will be deprecated.
@bert-e approve
The Fix Version/s
in issue BB-425 contains:
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
7.10.11
7.70.2
8.5.5
8.6.26
8.7.0
Please check the Fix Version/s
of BB-425, or the target
branch of this pull request.
The following options are set: approve
ping
A conflict has been raised during the creation of
integration branch w/8.5/bugfix/BB-425/order
with contents from w/7.70/bugfix/BB-425/order
and development/8.5
.
I have not created the integration branch.
Here are the steps to resolve this conflict:
$ git fetch
$ git checkout -B w/8.5/bugfix/BB-425/order origin/development/8.5
$ git merge origin/w/7.70/bugfix/BB-425/order
$ # <intense conflict resolution>
$ git commit
$ git push -u origin w/8.5/bugfix/BB-425/order
The following options are set: approve
I have created the integration data for the additional destination branches.
bugfix/BB-425/order
into
development/7.10
development/7.70
development/8.5
development/8.6
development/8.7
The following branches will NOT be impacted:
development/7.4
You can set option create_pull_requests
if you need me to create
integration pull requests in addition to integration branches, with:
@bert-e create_pull_requests
The following options are set: approve
The changeset has received all authorizations and has been added to the relevant queue(s). The queue(s) will be merged in the target development branch(es) as soon as builds have passed.
The changeset will be merged in:
:heavy_check_mark: development/7.10
:heavy_check_mark: development/7.70
:heavy_check_mark: development/8.5
:heavy_check_mark: development/8.6
:heavy_check_mark: development/8.7
The following branches will NOT be impacted:
development/7.4
There is no action required on your side. You will be notified here once the changeset has been merged. In the unlikely event that the changeset fails permanently on the queue, a member of the admin team will contact you to help resolve the matter.
IMPORTANT
Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a member of the admin team now.
The following options are set: approve
I have successfully merged the changeset of this pull request into targetted development branches:
:heavy_check_mark: development/7.10
:heavy_check_mark: development/7.70
:heavy_check_mark: development/8.5
:heavy_check_mark: development/8.6
:heavy_check_mark: development/8.7
The following branches have NOT changed:
development/7.4
Please check the status of the associated issue BB-425.
Goodbye nicolas2bert.
Summary:
Description:
Encoding algorithms history:
The "short version id" encoding uses lossy base62 encoding algorithms (Base62Integer, Base62String) that do not preserve the order of the timestamps in the version ID. Previously, encoding was done using the hex encoding algorithm, which is a lossless encoding algorithm because it does not lose any information during the encoding process.