Add ability to configure Vertex AI Datasets based on metadata schema

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment. If the issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If the issue is assigned to a user, that user is claiming responsibility for the issue. If the issue is assigned to "hashibot", a community member has claimed the issue already.

Description

The existing documentation for google_vertex_ai_dataset shows that a metadata schema uri is required to be provided, but there is no mechanism to provide a configuration that matches what that schema asks for.

At the time of writing there are 6 schemas:

Using tabular_1.0.0.yaml as an example there needs to be some way to provide inputConfig to the google_vertex_ai_dataset resource block.

title: Tabular
type: object
description: >
  The metadata of tabular Datasets. Can be used in Dataset.metadata_schema_uri
  field.
properties:
  inputConfig:
    description: >
      The tabular Dataset's data source. The Dataset doesn't store the data
      directly, but only pointer(s) to its data.
    oneOf:
    - type: object
      properties:
        type:
          type: string
          enum: [gcs_source]
        uri:
          type: array
          items:
            type: string
          description: >
            Cloud Storage URI of one or more files. Only CSV files are supported.
            The first line of the CSV file is used as the header.
            If there are multiple files, the header is the first line of
            the lexicographically first file, the other files must either
            contain the exact same header or omit the header.
    - type: object
      properties:
        type:
          type: string
          enum: [bigquery_source]
        uri:
          type: string
          description: The URI of a BigQuery table.
    discriminator:
      propertyName: type

New or Affected Resource(s)

google_vertex_ai_dataset

Potential Terraform Configuration

resource "google_vertex_ai_dataset" "dataset" {
  display_name          = "terraform-test"
  metadata_schema_uri   = "gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml"
  region                = "us-central1"
  config = {
    inputConfig = {
      type = "bigquery_source"
      uri  = "bq://project.dataset.table"
    }
  }
}

Honestly, because the metadata_schema_uri essentially acts as an enum then the config object would have to be more flexible than what I've written here, or else there would need to be 6 different dataset resources (i.e. google_vertex_ai_image_dataset, google_vertex_ai_tabular_dataset, etc).

References

9411

b/308248456

hashicorp / terraform-provider-google