raystack / guardian

Guardian is universal data access management tool with automated access workflows and security controls across data stores, analytical systems, and cloud products.
https://guardian.vercel.app/
Apache License 2.0
137 stars 21 forks source link

Fetch labels for bigquery dataset/table #372

Closed rahmatrhd closed 1 year ago

rahmatrhd commented 1 year ago

Summary BigQuery dataset/table has some metadata that hasn't been fetched into guardian resource details. We want to fetch them and store them in the resource details for some use case such as determining the dataset/table owner and pii flag

Proposed solution add metadata field within details:

// resource
{
  "id": "",
  ...
  "details": {
+    "metadata": {
+      "labels": {
+        "key": "value"
+      },
+.     "other_metadata": ""
+    }
  }
}

Other requirements:

bsushmith commented 1 year ago

@rahmatrhd couple of questions on this -

  1. Will this fetch all labels for a bigquery dataset/table or only specific configurable labels only?
  2. Is this specific to only bigquery resources?
rahmatrhd commented 1 year ago

@bsushmith

  1. It should fetch all labels (no configuration needed)
  2. For this, I'm setting the scope only for bq resources. If we later need to add metadata for other provider resources, maybe having a dedicated metadata column would be better