ministryofjustice / data-catalogue

Data catalogue • This repository is defined and managed in Terraform
MIT License
3 stars 0 forks source link

Bug: we do not have a consistent way to describe security classifications and PII #6

Closed MatMoore closed 6 months ago

MatMoore commented 8 months ago

Currently the way we are storing security classifications is inconsistent, so the data-platform-catalogue library is slightly broken.

This will prevent us from being able to distinguish sensitive datasets, and doing something useful with the classification filter.

It is also likely to cause confusion when we start looking at automatically detecting PII.

Problems

Missing code in get_table_details

We are missing something like this when return TableMetadata objects:

            data_sensitivity_level = custom_properties.get(
                "sensitivityLevel", SecurityClassification.OFFICIAL.name
            )

            try:
                data_sensitivity_level = SecurityClassification[data_sensitivity_level]
            except KeyError:
                logger.error(
                    f"Ignoring unknown classification: {data_sensitivity_level}"
                )
                data_sensitivity_level = SecurityClassification.OFFICIAL

Inconsistencies

Proposal

  1. Make sure the "Security classification" filter still meets an identified user need (currently it does nothing useful, as it just has one checkbox for OFFICIAL)
  2. Confirm the list of values we actually want to use here (e.g. OFFICIAL vs OFFICIAL-SENSITIVE?)
  3. Update the python library to be able to represent all these values, and consistently return the value from get_table_details
  4. Either rename the sensitivityLevel property to securityClassification or introduce new tags (TBD)
  5. Make sure we can still distinguish metadata that has been automatically detected as having PII vs marked as having PII or being OFFICIAL-SENSITIVE
jemnery commented 8 months ago

Can we decorate a security classification enum with descriptions to get over the hyphen issue? Not Python example 😄

public enum SecurityClassification {
  [Description("OFFICIAL-SENSITIVE")]
  OfficialSensitive,

  [Description("OFFICIAL")]
  Official
}
MatMoore commented 8 months ago

We can use whatever we want for the enum value, we would just need to change the code to use the value rather than the name https://docs.python.org/3/howto/enum.html#using-a-descriptive-string

MatMoore commented 6 months ago

This is not currently an issue as we've removed this field for now. Can be revisited later if we have a need for recording markings like official-sensitive etc when manually registering data to the catalogue