hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.28k stars 1.72k forks source link

Add ability to configure Vertex AI Datasets based on metadata schema #11278

Open racosta opened 2 years ago

racosta commented 2 years ago

Community Note

Description

The existing documentation for google_vertex_ai_dataset shows that a metadata schema uri is required to be provided, but there is no mechanism to provide a configuration that matches what that schema asks for.

At the time of writing there are 6 schemas:

Using tabular_1.0.0.yaml as an example there needs to be some way to provide inputConfig to the google_vertex_ai_dataset resource block.

title: Tabular
type: object
description: >
  The metadata of tabular Datasets. Can be used in Dataset.metadata_schema_uri
  field.
properties:
  inputConfig:
    description: >
      The tabular Dataset's data source. The Dataset doesn't store the data
      directly, but only pointer(s) to its data.
    oneOf:
    - type: object
      properties:
        type:
          type: string
          enum: [gcs_source]
        uri:
          type: array
          items:
            type: string
          description: >
            Cloud Storage URI of one or more files. Only CSV files are supported.
            The first line of the CSV file is used as the header.
            If there are multiple files, the header is the first line of
            the lexicographically first file, the other files must either
            contain the exact same header or omit the header.
    - type: object
      properties:
        type:
          type: string
          enum: [bigquery_source]
        uri:
          type: string
          description: The URI of a BigQuery table.
    discriminator:
      propertyName: type

New or Affected Resource(s)

Potential Terraform Configuration

resource "google_vertex_ai_dataset" "dataset" {
  display_name          = "terraform-test"
  metadata_schema_uri   = "gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml"
  region                = "us-central1"
  config = {
    inputConfig = {
      type = "bigquery_source"
      uri  = "bq://project.dataset.table"
    }
  }
}

Honestly, because the metadata_schema_uri essentially acts as an enum then the config object would have to be more flexible than what I've written here, or else there would need to be 6 different dataset resources (i.e. google_vertex_ai_image_dataset, google_vertex_ai_tabular_dataset, etc).

References

b/308248456

maksymsereda commented 7 months ago

Hi @racosta, could you please explain how to provide GCS url for tabular.csv if that "config" block you proposed is not possible to use