Refresh the list of hourly buckets
Update list of instances to trim at 3m
Also update the code to ignore blank lines and comments in records file.
Queries used to get the instance IDs:
Deleted:
select string_agg(id, ' ' order by id::integer) from organizations where deleted_at >= '2021-03-15' and deleted_at < '2021-05-01' and (first_seen_scope_connected_at is not null);
Expired:
select string_agg(id, ' ' order by id::integer) from organizations where created_at >= '2021-03-15' and deleted_at is null and (trial_expires_at < '2021-05-01' and refuse_data_upload and refuse_data_access) and not ('no-billing' = any(feature_flags)) and (first_seen_scope_connected_at is not null);
Ordinary (not enterprise deal):
select string_agg(id, ' ' order by id::integer) from organizations where created_at >= '2021-03-15' and created_at < '2021-05-01' and deleted_at is null and not (refuse_data_upload and refuse_data_access) and not ('no-billing' = any(feature_flags)) and (first_seen_scope_connected_at is not null);
Here is the Job used to create the list of records; note the need to increase DynamoDB read capacity while running:
apiVersion: batch/v1
kind: Job
metadata:
name: data-cleanup-scan
namespace: scope
spec:
backoffLimit: 0
completions: 1
parallelism: 1
template:
metadata:
annotations:
iam.amazonaws.com/role: scope-report-deleter
labels:
name: data-cleanup-scan
spec:
containers:
# With these settings and 4000 read capacity it runs for about 1 hour per 100GB in the table
- args:
- --app.collector=dynamodb://us-east-1/prod_reports
- --app.collector.s3=s3://us-east-1/weaveworks_prod_reports
- -segments=8
- -big-scan
env:
- name: GOMAXPROCS
value: "2"
image: 664268111851.dkr.ecr.us-east-1.amazonaws.com/scope-data-cleaning:master-30d42d80
name: scanner
ports:
- containerPort: 6060
protocol: TCP
resources:
requests:
cpu: 2
memory: 600Mi
restartPolicy: Never
Refresh the list of hourly buckets Update list of instances to trim at 3m Also update the code to ignore blank lines and comments in records file.
Queries used to get the instance IDs:
Deleted:
Expired:
Ordinary (not enterprise deal):
Here is the
Job
used to create the list of records; note the need to increase DynamoDB read capacity while running: