Currently, Mork handles the identification and deletion of inactive users in edX databases.
However, other FUN applications (Ashley, Joanie etc) also contain user data that needs to be deleted/anonymized. These applications can run in different namespaces.
Each application should be responsible for deleting its data, but Mork should be the source that informs each application which users need to be deleted.
To do so, 3 potential architectures.
1. Message Queue architecture
Mork publishes the list of users to be deleted in a message broker (Kafka, RabbitMQ).
Not retained because:
no need for real-time processing
added complexity for deployment and maintenance
2. Central datastore architecture
Mork stores a list of users to be deleted in a Redis, other applications read and pop those entries in the Redis.
Not retained because:
application can be in a different namespace than the Redis
hard to keep logs of what has been done by each apps
3. Central API service (in my opinion what we should do)
Mork exposes endpoints to query users that need to be deleted.
Each application periodically checks and processes its own data deletion.
I think this is the best solution because:
there is a clear separation of responsibilities
works across namespace
easy to keep logs
no additional infra (message broker or Redis) needed
Proposed implementation
Each application runs a daily cronjob that pull from Mork the list of users to be deleted, then confirm which users it has deleted by updating a status.
Mork exposes some the following endpoints:
List users to be deleted
Endpoint:GET /api/v1/users
Description: Retrieves a list of users to be deleted
Query Parameters:
status (required): one of to_delete, deleted
limit (optional): Number of results per page (default: 100)
offset (optional): Offset for pagination (default: 0)
Proposal
Currently, Mork handles the identification and deletion of inactive users in edX databases. However, other FUN applications (Ashley, Joanie etc) also contain user data that needs to be deleted/anonymized. These applications can run in different namespaces. Each application should be responsible for deleting its data, but Mork should be the source that informs each application which users need to be deleted.
To do so, 3 potential architectures.
1. Message Queue architecture
Mork publishes the list of users to be deleted in a message broker (Kafka, RabbitMQ).
Not retained because:
2. Central datastore architecture
Mork stores a list of users to be deleted in a Redis, other applications read and pop those entries in the Redis.
Not retained because:
3. Central API service (in my opinion what we should do)
Mork exposes endpoints to query users that need to be deleted. Each application periodically checks and processes its own data deletion.
I think this is the best solution because:
Proposed implementation
Each application runs a daily cronjob that pull from Mork the list of users to be deleted, then confirm which users it has deleted by updating a status. Mork exposes some the following endpoints:
List users to be deleted
GET /api/v1/users
status
(required): one ofto_delete
,deleted
limit
(optional): Number of results per page (default: 100)offset
(optional): Offset for pagination (default: 0)200 OK
Update User delete status
PATCH /api/v1/users/{user_id}/status
200 OK
Batch Update user deletion status
PATCH /api/v1/users/status
200 OK