weaveworks / service

☁️ Images for Weave Cloud (R) (TM) (C) ☁️
https://cloud.weave.works
2 stars 2 forks source link

Update scope deletion #2715

Closed bboreham closed 3 years ago

bboreham commented 3 years ago

Refresh the list of hourly buckets Update list of instances to trim at 3m Also update the code to ignore blank lines and comments in records file.

Queries used to get the instance IDs:

Deleted:

select string_agg(id, ' ' order by id::integer) from organizations where deleted_at >= '2021-03-15' and deleted_at < '2021-05-01' and (first_seen_scope_connected_at is not null);

Expired:

select string_agg(id, ' ' order by id::integer) from organizations where created_at >= '2021-03-15' and deleted_at is null and (trial_expires_at < '2021-05-01' and refuse_data_upload and refuse_data_access) and not ('no-billing' = any(feature_flags)) and (first_seen_scope_connected_at is not null);

Ordinary (not enterprise deal):

select string_agg(id, ' ' order by id::integer) from organizations where created_at >= '2021-03-15' and created_at < '2021-05-01' and deleted_at is null and not (refuse_data_upload and refuse_data_access) and not ('no-billing' = any(feature_flags)) and (first_seen_scope_connected_at is not null);

Here is the Job used to create the list of records; note the need to increase DynamoDB read capacity while running:

apiVersion: batch/v1
kind: Job
metadata:
  name: data-cleanup-scan
  namespace: scope
spec:
  backoffLimit: 0
  completions: 1
  parallelism: 1
  template:
    metadata:
      annotations:
        iam.amazonaws.com/role: scope-report-deleter
      labels:
        name: data-cleanup-scan
    spec:
      containers:
      # With these settings and 4000 read capacity it runs for about 1 hour per 100GB in the table
      - args:
        - --app.collector=dynamodb://us-east-1/prod_reports
        - --app.collector.s3=s3://us-east-1/weaveworks_prod_reports
        - -segments=8
        - -big-scan
        env:
        - name: GOMAXPROCS
          value: "2"
        image: 664268111851.dkr.ecr.us-east-1.amazonaws.com/scope-data-cleaning:master-30d42d80
        name: scanner
        ports:
        - containerPort: 6060
          protocol: TCP
        resources:
          requests:
            cpu: 2
            memory: 600Mi
      restartPolicy: Never