ucphhpc / migrid-sync

MiGrid workspace where master branch is kept strictly in sync with SF upstream svn repo. Any development or experiments should use a branch. You probably want to fork your own clone or work e.g. on the edge branch if you wish to contribute.
GNU General Public License v2.0
3 stars 3 forks source link

Configurable scrambling of ID fields in gdp.log #28

Closed jonasbardino closed 8 months ago

jonasbardino commented 10 months ago

On sensitive data sites (enable_gdp) we have an extra gdp.log, which keeps track of all user operations. As these logs are intended for live streaming to one or more remote log servers with potentially less paranoid access control, we have so far chosen to scramble ID fields and encrypt path metadata in order to mask it for other admins on those remote log servers. With that setup one can decrypt the Fernet-encrypted path fields with the secret crypto salt from the actual site server and match existing user IDs to the scrambled values with just a sha256-hash of the ID. Thus, the one-way hashing operation in practice works as a pseudonymization and comes with the added value that one can delete the user ID to remove the link and make the entry anonymous if need be. E.g. if the EU GDPR "right to be forgotten" comes into play.

We have developed this scheme with thought and care, and believe it's the best balance between assuring log (meta)data protection in relation to confidentiality, integrity and availability. Yet, different schemes may be safe and interpret the requirements in other ways, so if possible it should be allowed to configure other logs protection methods.

We have received external requests for generally skipping the scrambling of IDs, and we could ourselves see value in allowing proper encryption as an alternative to hashing. It should be possible to implement and expose with a new configuration option to handle all scrambling.

jonasbardino commented 10 months ago

A first version of the implementation is available in svn@r5807 and the corresponding git edge@a109465cf60b13712e0eb40965fa1706db436ca8 and experimental@455ffcb2e87dc465ecd5ed2047ded1e69b82cb11 revisions. Feel free to try it out with the new gdp_id_scramble configuration option as explained in the commit log and in the configuration docs: https://github.com/ucphhpc/migrid-sync/blob/376986c04f66da9ed0660271709e988357e6def4/mig/install/MiGserver-template.conf#L106

jonasbardino commented 10 months ago

I found a minor issue with the handling of the corresponding option in the conf generator and fixed it in svn r5810 and the corresponding git edge@71a98eaeff94a3939c360629c4d14a5f03381025 and experimental@82cf41c283b7b0c6bab3702d9c941e8f9b51ebb2 . Please go with those if you run into issues with trailing underscores in the conf value or start the docker-migrid integration.

The option is documented in the MiGserver.conf file and it can be seen e.g. by clicking the MiGserver-template.conf link in my previous comment.