Live Data guidance observations

'security-guidance/docs/using-live-data-for-testing-purposes.md'

While I welcome advice on when, where and how we should use live data, I wonder how effectively teams will be able to follow the initial alpha release, or if it's entirely appropriate.

'Note: It is important that test data is protected to the same standard as the live data. This is to ensure that details of the system design and operation are not compromised.' I agree that test data that is a copy of live data must be protected in the same way the live data is.

However, test data that is derived from live data so that it mirrors the complexity and volume, but that is anonymised, does not need that level of protection, nor does test data generated by statistical matching of the live data characteristics. The anonymisation process will need the same level of protection as the source, and the anonymisation scripts/rules may need protection if it would be possible to determine and reverse the anonymisation process (probably not anonymisation, but tokenisation in this case).

It might be sensible to use not-derived-from-live-test data in an environment that matches the security protections of the source data, but that is for performance and accessibility testing reasons, not security and therefore does not require replica controls.

Since un-derived test data does not carry a PII risk, it can be used without special oversight from test managers, and does not need to be deleted after use (from a security perspective). This can present a learned-response problem where developers forget which set of data they are working with and do the wrong thing, so a sensible mitigation may be to always treat all data as live and have consistent safe handling practices.

Don't forget that many 'system designs' are published openly on Github, so they are thoroughly compromised!

It might be worth including age and identifies-as-sex to the data to be anonymised. Where any field cannot be anonymised, perhaps where it's important to the effective test of a system, then it becomes a CISO sign-off or live-matching security configuration.

Is the scope of the document 'using live data elsewhere' or 'testing a system with live data'? If it is 'using' then the GDPR 'are you allowed to do this' perhaps should be the introduction, with redirection to another document. If it is 'testing' then I don't see why GDPR is important, unless the test is of a existing system that has undergone such significant change that it is now capable of processing data in a way the existing system can't. In that case, the DPIA should already have been amended and reviewed before any work was started. Any existing system and team with PII liabilities should already have solid GDPR handling practices.

When live data has been anonymised, it should be in a state that makes it safe to be left on an unencrypted memory device on a bus, from a PII perspective. If it gives away operational or organisational intelligence, then it clearly needs to remain carefully handled.

Tim

ministryofjustice / security-guidance

Live Data guidance observations #98