ovh / public-cloud-roadmap

Agile roadmap for OVHcloud Public Cloud services. Discover the features our product teams are working on, comment and influence our backlog.
https://www.ovhcloud.com/en/public-cloud/
184 stars 5 forks source link

Object Storage - lifecycle policies #210

Open tanandy opened 2 years ago

tanandy commented 2 years ago

As a user I want to be able to manage lifecycle policies I want to be able to define rules (expiration/transition) (e.g move to another storage class like Archive, delete object....) I want to be able to consume these features through standard APIs / OVHcloud Customer panel Then i will be able to define what happens when my object expires Then i will be able to mirror existing bucket into an archive one using cross region replication

lason-ovh commented 1 year ago

We will support the following lifecycle configuration elements :

The following lifecycle configuration elements will not be supported for now:

lason-ovh commented 1 year ago

Hi all, we announced a few months ago that this feature will be available for Q2 2023. As you know, we just released our new long term storage Cold Archive and we really worked hard to make it available for all of our clients. Unfortunately, we had to put a lot of efforts in working on Cold Archive and we had to delay the release of the lifecycle feature. However, we expect to release this feature somewhere in Q3 2023.

delahondes commented 1 year ago

A way to delete incomplete multipart uploads would be highly appreciated, even outside policy management...

lason-ovh commented 1 year ago

A way to delete incomplete multipart uploads would be highly appreciated, even outside policy management...

Hi, the v1 of this feature will support automatic deletion of failed multipart uploads

Caffe1neAdd1ct commented 1 year ago

A way to delete incomplete multipart uploads would be highly appreciated, even outside policy management...

@delahondes i've put a couple of manual scripts in place, but the basics look like this:

  1. Find multi part uploads
    • i'd recommend filter the returning json by older than x hours (you'll need to look at your situation and adjust accordingly or you'll end up removing multipart files of uploads in progress):
aws --endpoint-url https://s3.gra.io.cloud.ovh.net/ --profile default s3api list-multipart-uploads --bucket BUCKET_NAME_HERE > multi-parts.json
  1. do some filtering on the multi-parts.json

  2. Submit deletion requests:

#!/bin/bash
aws --endpoint-url https://s3.gra.io.cloud.ovh.net/ --profile default s3api abort-multipart-upload --bucket BUCKET_NAME_HERE --key "urn:oid:NUMBER_HERE" --upload-id "UPLOAD_ID_HERE"

or use a bash script to automate:

#!/bin/bash

echo -e "Removing multi-part-uploads"

jq -c '.Uploads[]' multi-parts.json | while read upload; do
    key=$(echo $upload | jq -c -r '.Key');
    uploadId=$(echo $upload | jq -c -r '.UploadId');
    echo -e $key;
    echo -e $uploadId;

    $(aws --endpoint-url https://s3.gra.io.cloud.ovh.net/ --profile default s3api abort-multipart-upload --bucket BUCKET_NAME_HERE --key "$key" --upload-id "$uploadId")
done

Hope this helps

delahondes commented 1 year ago

@Caffe1neAdd1ct Many thanks, I've just tried. I needed to adapt (I had en error message the first time:

An error occurred (BadEndpoint) when calling the ListMultipartUploads operation: This bucket is not accessible through this endpoint.

But then I realize that with the high perf S3 that I use, the endpoint is different (https://s3.gra.io.cloud.ovh.net/ must be replaced by https://s3.gra.perf.cloud.ovh.net/).

However, this does not work in my case, I never get the multi-parts.json :

Connection was closed before we received a valid response from endpoint URL: "https://s3.gra.perf.cloud.ovh.net/rnd?uploads".

Then maybe I have an extreme case, we have several Tb of invisible files in that bucket (that I suspect to be failed multipart uploads, but I am unsure about it).

Caffe1neAdd1ct commented 1 year ago

@delahondes Good spot on the endpoint URI

Might be worth looking into the aws cli docs and seeing if any of the pagniation options would help return something in a more timely manor:

https://docs.aws.amazon.com/cli/latest/reference/s3api/list-multipart-uploads.html

--max-items 1

on the end of the command for a test might help/work, otherwise the OVH API might not be able to handle the volume in the bucket and it might be worth migrating the known good items to a new bucket..

delahondes commented 1 year ago

@Caffe1neAdd1ct Thanks again, I"ve tried but it is still the same thing:

aws --endpoint-url https://s3.gra.perf.cloud.ovh.net/ --profile default s3api list-multipart-uploads --bucket mybucket --max-items 1 > multi-parts.json

Connection was closed before we received a valid response from endpoint URL: "https://s3.gra.perf.cloud.ovh.net/rnd?uploads".

I've tried also to increase timeouts (options are in your doc, thanks): aws --endpoint-url https://s3.gra.perf.cloud.ovh.net/ --profile default s3api list-multipart-uploads --bucket mybucket --max-items 1 --cli-connect-timeout 600 --cli-read-timeout 600 > multi-parts.json But it fails all the same with the same error. I try to put timeout to max.

Caffe1neAdd1ct commented 1 year ago

@delahondes have you tried on a fresh / empty bucket to see if you can get an empty successful response back?

delahondes commented 1 year ago

@Caffe1neAdd1ct It works on a fresh empty bucket:

aws --endpoint-url https://s3.gra.perf.cloud.ovh.net/ --profile default s3api list-multipart-uploads --bucket test42 > multi-parts.json

The returncode is 0 and multi-parts.json is empty but it works.

delahondes commented 1 year ago

I've also tested an intermediate bucket, about the same age and same usage as the faulty one above, but not empty: it contains around ~1Tb of data (it has not been much used) and the first command works also, although the output is empty. So it seems the first bucket (the one I have called "mybucket" is somehow corrupted).

Caffe1neAdd1ct commented 1 year ago

@delahondes I'd suggest a ticket to ovh support with as much detail as possible for them to be able to investigate and resolve

delahondes commented 1 year ago

Yes thanks, I did that already, and I pointed them to this discussion. Now the bucket is empty (0 objects) with a volume over 10 Tb... I hope they find something quickly as it is quite expensive.

JustDoItSascha commented 5 months ago

Any updates on this?

MattMills commented 5 months ago

Just want to add I am another customer that is desperate for this to be implemented.

lason-ovh commented 5 months ago

Hi all, first of all, thank you for your commitment with OVHcloud and for your interest in this feature. We are currently working hard to make sure this feature meets the highest quality standards and is delivered very soon.

We are planning to do a 2-steps release:

  1. automatic object and incomplete MPU deletion in non-versioned buckets
  2. automatic object expiration in versioned buckets as lifecycle rules are more complex with versioned objects

For now, once versioning is enabled on a bucket, we will not allow users to suspend it while having lifecycle activated because it adds a new layer of complexity to manage.

Tell us what you think in the comments below :)

JustDoItSascha commented 5 months ago

For now, once versioning is enabled on a bucket, we will not allow users to suspend it while having lifecycle activated because it adds a new layer of complexity to manage.

Unfortunately I don't understand that point. What do you mean by "suspend". And what do you mean with "For now"? Because right now I can't add lifecycle rules at all.

lason-ovh commented 5 months ago

For now, once versioning is enabled on a bucket, we will not allow users to suspend it while having lifecycle activated because it adds a new layer of complexity to manage.

Unfortunately I don't understand that point. What do you mean by "suspend". And what do you mean with "For now"? Because right now I can't add lifecycle rules at all.

Yes, this feature is still in progress that's why you can't add lifecycle rules yet but when it is released, you won't be able to have lifecycle rules AND versioning suspended at the same time. FYI, versioning on a bucket can have 3 states:

JustDoItSascha commented 5 months ago

Ah ok, yes this is no problem (for me). For us the most important feature would be, that objects, which are marked for deletion are getting deleted for example after 30 days.

MattMills commented 5 months ago

I only need lifecycle rules for mass deletion, trying to delete 1B+ objects in the current API is not ideal.

lason-ovh commented 5 months ago

I only need lifecycle rules for mass deletion, trying to delete 1B+ objects in the current API is not ideal.

Yes, of course and the api is not intended for such usages. By using a simple lifecycle config such as the following, you can easily automated deletion of your objects:

{
  "Rules": [
    {
      "Expiration": {
        "Days": 30
      },
      "Filter": {
        "Prefix": "to-be-deleted"
      },
      "Status": "Enabled",
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 10
      }
    }
  ]
}
JustDoItSascha commented 5 months ago

I only need lifecycle rules for mass deletion, trying to delete 1B+ objects in the current API is not ideal.

Yes, of course and the api is not intended for such usages. By using a simple lifecycle config such as the following, you can easily automated deletion of your objects:

{
  "Rules": [
    {
      "Expiration": {
        "Days": 30
      },
      "Filter": {
        "Prefix": "to-be-deleted"
      },
      "Status": "Enabled",
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 10
      }
    }
  ]
}

I thought lifecycle rules are not implemented yet? Now I'm confused...

lason-ovh commented 5 months ago

I only need lifecycle rules for mass deletion, trying to delete 1B+ objects in the current API is not ideal.

Yes, of course and the api is not intended for such usages. By using a simple lifecycle config such as the following, you can easily automated deletion of your objects:

{
  "Rules": [
    {
      "Expiration": {
        "Days": 30
      },
      "Filter": {
        "Prefix": "to-be-deleted"
      },
      "Status": "Enabled",
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 10
      }
    }
  ]
}

I thought lifecycle rules are not implemented yet? Now I'm confused...

Lifecycle is not implemented yet but the json conf I posted is just a teaser of what's next to come ;)

JustDoItSascha commented 5 months ago

Ok cool, and the estimated release date is?

scndel commented 3 months ago

Any updates on this ? It remains an important feature both for GDPR compliance & FinOps. @lason-ovh

However, we expect to release this feature somewhere in Q3 2023.

lason-ovh commented 2 months ago

Work is in progress! You can expect a first release next quarter ;)

alegendre commented 2 months ago

We will support the following lifecycle configuration elements :

  • object expiration on versioned buckets and non-versioned buckets
  • failed/incomplete multipart upload parts

The following lifecycle configuration elements will not be supported for now:

  • transition to lower tier storage class from HIGH PERF to STANDARD
  • integrity check of configuration file (checksum)
  • transition to COLD STORAGE (deep archive)

Are these not supported for now configuration elements are planned as they are a critical functionalities ?

drakkan commented 5 days ago

Hello,

I am evaluating OVH to provide a managed service, I would like to enable versioning and automatically delete old versions of an object after, say, 7 days. The current version of the object should not be deleted, I am not clear if this feature will be enabled in your next update which, as I understand, is planned soon. Can you please clarify? Thanks!