sa-tre / satre-specification

Standard Architecture for Trusted Research Environments specification
https://satre-specification.readthedocs.io
Creative Commons Attribution 4.0 International
18 stars 8 forks source link

[Discussion]: Risk-based controls #207

Open willc-RISG opened 1 year ago

willc-RISG commented 1 year ago

Summary

Risk-based controls (tiering)

Source

Me (endlessly)

Detail

Some aspects or TRE controls are 'core', that is they are generic to TREs, regardless of what sort of data is being processed. For example, all users need a unique login, data access is on a needs basis, data can only be ingressed and egressed by specific controlled routes (no random local copy/paste, wild internet access etc.

Other aspects should be (I think) risk-based, that is the level and type of controls should reflect potential risk to the data, e.g. unwanted disclosure. Specifically, I think all projects should classify data along the lines of the ATI risk tiers, and then use systems with appropriate controls. So, probably T0 and T1 would not require a TRE at all (low risk), so save money and pain and use something else. T3 and T4 would need a TRE, and T2 perhaps might, but the controls applied within the TRE would be different, because the risk is different.

Here's some suggestions:

Datasets required (and to be ingressed), processing to be done, and data to be exported (egressed) should be defined prospectively in the project DMP. This is 'core' to all TREs.

If data is high risk, directly identifiable and sensitive health data say, (tier 4?) then this data must be checked, confirmed compliant with the DMP, and ingressed/egressed by a third party (i.e. not the researcher). Egress obviously carries greater risk, but even ingress needs to check the data coming into the project space is as expected (no extra fields etc).

For lower risk data, say where direct identifiers have been replaced with pseudo IDs, then the perhaps we might allow some degree of flexibility. Maybe the project PI (or delegate) can in/egress data, after signing an extra agreement covering these responsibilities say. For even lower risk data (strong pseudo say), maybe in/egress can be delegated to specified project staff (who also sign up to the agreement).

Likewise, the location from which the TRE may be accessed should also depend on data risk. For example, directly identifiable sensitive data should only be accessible to those with a need, so access should be restricted to a project-specific place - shoulder surfing might immediately disclose sensitive information (and break the law). Where data is pseudonymised, this is less of a risk, so an office shared with other researchers might be OK. With better deidentification, access might be acceptable from any shared office (no research), home or - ultimately a café or train.

By classifying risk, and applying controls - systematically - we can reduce risk, make using TREs easier, and cheaper. A one size fits all approach, usually based on the highest risk data, does not work and is not necessary. This will only work if we develop a national approach to risk classification, and the corresponding controls.

We need to get the balance right though, only mandating controls where it really matters, and leaving a degree of flexibility. For example, by default TREs should 'deny all' internet access. But perhaps for some projects this means they cannot use a TRE, so we record the risk as an exception, mitigate it (whitelist etc) and monitor.

In this way, responsibility is delegated, where appropriate, pain is reduced, efficiency improved - but all within a structured risk-based way. That's the dream anyway.

So, tangible action. I have tabulated the specification items, added the ATI tiers, and made a start at unpicking the tier-specific controls. There aren't that many (thankfully), with most items being core to all TREs. I think it would be immensely helpful to run a workshop on this idea - to beat it about and try to break it (based on peoples' experience at the coal face) and either polish an adopt it, or burn it if it just doesn't work.

All opinions welcome, particularly from those that are 'doing' TREs, as users or operators.

Will

Intended Output

Agree a risk-based approach to data classification and corresponding controls (tech and policy).

Who can help

Everyone

jemrobinson commented 1 year ago

A possible SATRE output could be a classification of TREs into tiers, I don't think trying to classify datasets or projects is in scope for us.

willc-RISG commented 1 year ago

Not sure there is worth in one without the other - they are umbilically linked.

jemrobinson commented 1 year ago

@willc-RISG: what I mean here is that the purpose of the SATRE project is to develop a TRE architecture standard. Being able to say "here's what it means to be Tier-2 SATRE TRE" could definitely be in scope for that, and gives TRE operators the opportunity to label themselves and say "go and read this document if you want to know what that means."

However, if we start writing something that tells you how to classify a project (NB. classifying a dataset on its own is less useful than dataset + what you plan to do with it) then we're saying saying "your project is Tier-2 and you shouldn't be allowed to do it except in an environment that looks like this". I don't think this would be as useful to the community.

willc-RISG commented 1 year ago

Well actually your concern (limiting activity to environments with proportionate security) is precisely what I’m proposing, and surely what we ought to be striving for? High risk data goes (or stays) in highly secure environments; less risky data needs less protection, and should be easier (and cheaper maybe) to play with.

JimMadge commented 1 year ago

@all-contributors please add @willc-RISG for ideas

allcontributors[bot] commented 1 year ago

@JimMadge

I've put up a pull request to add @willc-RISG! :tada:

jemrobinson commented 1 year ago

Well actually your concern (limiting activity to environments with proportionate security) is precisely what I’m proposing, and surely what we ought to be striving for? High risk data goes (or stays) in highly secure environments; less risky data needs less protection, and should be easier (and cheaper maybe) to play with.

I'm just thinking that saying "We've come up with a new way for you to classify your projects" isn't going to be an attractive prospect to organisations who already have a way of classifying project/data sensitivity. On the other hand saying "here's a classification of TREs" might, for example, help them to say "Oh, your Tier 1 TRE is fine for our Class B data, but Class A would need a Tier 3 TRE - can you support that?".

willc-RISG commented 1 year ago

But… in order for TRE class to have meaning it must be tied to data class. In fact, vice versa is the rational way. So we (well you mainly!) have classified data risk, which we link to TRE controls. It has to be a national approach though, otherwise it will have no meaning. We need the ICO to endorse the risk-based approach, which is what DP law and good practice has been telling us to do decades. Currently we live in limbo which isn’t helpful to anyone, least of all the public who underwrite trust in research.

jemrobinson commented 1 year ago

You might well be right, but we (SATRE) aren't in a position to impose a national data classification system, especially not one that overrides existing classifications (e.g. NHS Class I - V or UK Government OFFICIAL/OFFICIAL-SENSITIVE/SECRET/TOP-SECRET).

Given that constraint (and the fact that we only have about two months of funding left) I think it makes more sense to produce a recommendation for the thing that is in scope (classification of TREs) and put pressure on organisations who can make national decisions to come up with a framework of how to match data to TRE level.

I'm not trying to dismiss your idea, I just want to be realistic about what we can achieve in the SATRE project.

willc-RISG commented 1 year ago

I’m not proposing overriding existing classification tools. These too just need mapping to TRE controls. But we should be clear, if we’re not able to define the purpose for using/not using the optional TRE controls then we’ll end up with (a) a one size (doesn’t) for all specification, based on the highest risk approach or (b) a woolly mess of things one might do depending on risk appetite, which nationally is all over the place. There’s an opportunity here that must not be squandered. Even if this amounts to a proposal for a mapping of data risk to TRE controls.

drchriscole commented 1 year ago

Hi @willc-RISG. Thanks for raising this idea and I do understand your position, however, this proposal is looking at tiering /data/ and not TREs. The SATRE project is looking at defining a standard for TREs. Local governance and/or the data controller can choose to make additional requirements to any standard. The SATRE spec has no control over that. We cannot mandate what type of TRE a data controller should specify. This is where the likes of DSPT, ISO and DEA come in which sit on top of SATRE.

Like with the Turing tiers the 4 levels very quickly boil down to two - T0/1 unconsidered, T4 is always going to be bespoke/unique solution - so we're back to yes/no situation which is where we are with SATRE.

So at the moment the feeling is to be aware of the ask, but, given we only have two months left, need to keep focused on the core tasks.

willc-RISG commented 1 year ago

Well, I would argue if most TREs end up being bespoke, we simply don’t have a standard, just vague suggestion about managing risk appetite. But if it’s out of scope so be it. At least though we could define core TRE functionality (unique login, MFA, network isolation at project level etc) and the additional ‘levers’ that might be adjusted to suit data risk, e,g. Who gets the ingress and egress roles, where data can be accessed from etc. At least that helps future proofs things for a world in which data is managed according to a common understanding of risk.

drchriscole commented 1 year ago

Indeed, which is what we're doing with the mandatory, recommended and optional. We can certainly add information stating that the recommended and optional requirements will improve the chances of a TRE to work with more sensitive data. This is a model that fits with current accredition schemes.

My comment about Tier 4 being bespoke is that, as per the Turing model, it is at the national security level of sensitivity. That's a rare need by a small number of (secretive) organisations who are going to have very specific needs and risk profiles.

manics commented 1 year ago

I think the biggest challenge is having a widely accepted definition of different data sensitivity tiers that's applicable across different sectors. Unfortunately I don't think there is anything we can use, especially when you look beyond the traditional uses of TREs. For example in industrial TREs with commercially sensitive data the risks, consequences and perceptions are very different from TREs holding sensitive public data.

However I can see the benefit in ensuring the SATRE specification can be built upon or extended beyond it's core remit, especially if we take advantage of modern web technologies and machine redable formats instead of being restricted to traditional print/PDF friendly formats. For example, the linking between Turing data sensitivity tiers (or NHS tiers, etc) and SATRE capabilities could be a separate document/web-explorer?

harisood commented 1 year ago

Just weighing in on this - @willc-RISG we like this idea but as stated above think it's beyond the scope of the general SATRE project. However this is something we're interested in at Turing and will look to pick up as a Turing endeavour (e.g. a Turing interpretation of SATRE) once SATRE finishes in October.

As there isn't universal agreement on what Tiers mean across the ecosystem, but we have an idea at Turing, we feel this work is best suited to be a Turing output rather than 'community representative' output. At least at this stage!

willc-RISG commented 1 year ago

Am hearing you loud and clear :-) To ensure future strategic nirvana remains a possibility though, I hope we can ensure that the SATRE TRE spec includes the various discrete levers that can be nudged/yanked as necessary, in proportion to risk.