kube-reporting / metering-operator

The Metering Operator is responsible for collecting metrics and other information about what's happening in a Kubernetes cluster, and providing a way to create reports on the collected data.
Apache License 2.0
339 stars 86 forks source link

S3 data remains when deleting datasource. #982

Open JooyoungJeong opened 4 years ago

JooyoungJeong commented 4 years ago

Hi. I installed using release-4.2. Hive uses s3Compatible.

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
  name: "operator-metering"
spec:
  disableOCPFeatures: true
  reporting-operator:
    spec:
      config:
        prometheus:
          # update this field
          url: "<IP>"
  hive:

  storage:
    type: "hive"
    hive:
      type: "s3Compatible"
      s3Compatible:
        bucket: "metering"
        secretName: "my-aws-secret"
        createBucket: false
        endpoint: "<IP>"
apiVersion: metering.openshift.io/v1
kind: ReportDataSource
metadata:
  name: mlp-test-gpu-datasource
  namespace: metering
spec:
  prometheusMetricsImporter:
    query: |
      metering:mlp_gpu_requests_slots:sum

I created a datasource and confirmed that it is stored in a bucket of s3. And deleted this datasource. It was deleted in the hive table but not in s3.

for obj in client.list_objects_v2(Bucket="metering", Prefix="metering.db/")['Contents']:
    print(obj['Key'])

metering.db/datasource_metering_mlp_test_gpu_datasource/dt=2019-10-14/20191014_120145_00422_hpwrj_fc1d84f3-536e-4a86-9097-2c41b4935e49.snappy
metering.db/datasource_metering_mlp_test_gpu_datasource/dt=2019-10-14/20191014_120157_00424_hpwrj_18644871-9f4d-4781-93a8-374aef4a67a7.snappy

Can I delete the data in s3?

Thank you

chancez commented 4 years ago

We don't use finalizers yet, so if the pods are deleted while the datasource is deleted, data may not be cleaned up, that being said, generally if you delete a datasource you created, it should delete the data when it drops the table which happens when you delete a datasource.

chancez commented 4 years ago

You can manually clean up the data if the datasource was deleted though, that should be fine. You can also drop the table from within Presto or Hive and that will do the same.

JooyoungJeong commented 4 years ago

@chancez
Thank you for your feedback. If I delete the datasource, the hive table is deleted. However, the s3 bucket data remained and was manually deleted. Thank you