hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.36k stars 1.75k forks source link

How to add multiple postgresql_columns to google_datastream_stream #13838

Open cristopher-rodrigues opened 1 year ago

cristopher-rodrigues commented 1 year ago

When I try:

"postgresql_tables" = {
                "postgresql_columns" = {
                  "column" = "name"
                }

                "postgresql_columns" = {
                  "column" = "id"

                  "primary_key" = true
                }

                "table" = "will_sync"
              }

it just crashes...

Google API Response Details

HTTP/2.0 200 OK
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Cache-Control: private
Content-Type: application/json; charset=UTF-8
Date: Fri, 24 Feb 2023 21:36:12 GMT
Server: ESF
Vary: Origin
Vary: X-Origin
Vary: Referer
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 0

{
  "name": "projects/kelvin-datascience-staging/locations/us-central1/operations/operation-1677274559832-5f578ea5460e4-3123121e-66df20b8",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.datastream.v1.OperationMetadata",
    "createTime": "2023-02-24T21:36:00.038421700Z",
    "endTime": "2023-02-24T21:36:01.528873333Z",
    "target": "projects/kelvin-datascience-staging/locations/us-central1/streams/test_stream_id",
    "verb": "create",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": true,
  "error": {
    "code": 13,
    "message": "an internal error has occurred"
  }
}
│ Error: Error waiting to create Stream: Error waiting for Creating Stream:
│
│   with google_datastream_stream.test,
│   on datastream.tf line 22, in resource "google_datastream_stream" "test":
│   22: resource "google_datastream_stream" "test" {
│
edwardmedia commented 1 year ago

@cristopher-rodrigues are you able to achieve this with other tools like gcloud? If yes, what is the full command?

cristopher-rodrigues commented 1 year ago

@cristopher-rodrigues are you able to achieve this with other tools like gcloud? If yes, what is the full command?

Hi @edwardmedia,

Unfortunately, we only use GCP. Here is our full config:

resource "google_datastream_connection_profile" "source" {
  display_name          = "Postgresql Source"
  location              = "us-central1"
  connection_profile_id = "source_test_connection_profile_id"

  postgresql_profile {
    hostname = google_sql_database_instance.test.public_ip_address
    username = google_sql_user.users.name
    password = google_sql_user.users.password
    database = google_sql_database.database.name
  }
}

resource "google_datastream_connection_profile" "destination" {
  display_name          = "BigQuery Destination"
  location              = "us-central1"
  connection_profile_id = "destination_test_connection_profile_id"

  bigquery_profile {}
}

resource "google_datastream_stream" "test" {
  display_name  = "Postgres to BigQuery"
  location      = "us-central1"
  stream_id     = "test_stream_id"
  desired_state = "RUNNING"

  source_config {
    source_connection_profile = google_datastream_connection_profile.source.id
    postgresql_source_config {
      max_concurrent_backfill_tasks = 12
      publication                   = "test_publication"
      replication_slot              = "test_replication"
      include_objects {
        postgresql_schemas {
          schema = "public"
          postgresql_tables {
            table = "will_sync"
            postgresql_columns {
              column      = "id"
              primary_key = true
            }
            postgresql_columns {
              column = "name"
            }
          }
        }
      }
      exclude_objects {
        postgresql_schemas {
          schema = "public"
          postgresql_tables {
            table = "wont_sync"
            postgresql_columns {
              column = "column"
            }
          }
        }
      }
    }
  }

  destination_config {
    destination_connection_profile = google_datastream_connection_profile.destination.id
    bigquery_destination_config {
      data_freshness = "900s"
      source_hierarchy_datasets {
        dataset_template {
          location = "us-central1"
        }
      }
    }
  }

  backfill_all {}
}
cristopher-rodrigues commented 1 year ago

and more outputs:

google_datastream_connection_profile.destination: Creating...
google_datastream_connection_profile.source: Creating...
google_datastream_connection_profile.destination: Creation complete after 3s [id=projects/kelvin-datascience-staging/locations/us-central1/connectionProfiles/destination_test_connection_profile_id]
google_datastream_connection_profile.source: Still creating... [10s elapsed]
google_datastream_connection_profile.source: Creation complete after 14s [id=projects/kelvin-datascience-staging/locations/us-central1/connectionProfiles/source_test_connection_profile_id]
google_datastream_stream.test: Creating...
google_datastream_stream.test: Still creating... [10s elapsed]
╷
│ Error: Error waiting to create Stream: Error waiting for Creating Stream:
│
│   with google_datastream_stream.test,
│   on datastream.tf line 22, in resource "google_datastream_stream" "test":
│   22: resource "google_datastream_stream" "test" {
cristopher-rodrigues commented 1 year ago

One workaround that actually works is: to specify multiple postgresql_columns in the exclude_objects

cristopher-rodrigues commented 1 year ago

It turned out that the same bug occurs when a primary_key is specified 😭

davidkhardwick commented 1 year ago

are you able to achieve this with other tools like gcloud?

@cristopher-rodrigues - I think @edwardmedia was asking if you also get the same error when using the gcloud commands https://cloud.google.com/sdk/gcloud/reference/datastream/streams/create

cristopher-rodrigues commented 1 year ago

are you able to achieve this with other tools like gcloud?

@cristopher-rodrigues - I think @edwardmedia was asking if you also get the same error when using the gcloud commands https://cloud.google.com/sdk/gcloud/reference/datastream/streams/create

Oh, got it. Unfortunately, I couldn't find a way to add PKs (like TF doc suggests, for PostgreSQL at least) to datastream via the SDK. For multiple columns, we should be fine as long as exclude columns works.

cristopher-rodrigues commented 1 year ago

@edwardmedia this is the request that fails:

POST /v1/projects/project/locations/us-central1/streams?alt=json&streamId=test_stream_id2 HTTP/1.1
Host: datastream.googleapis.com
User-Agent: Terraform/1.3.9 (+https://www.terraform.io) Terraform-Plugin-SDK/2.10.1 terraform-provider-google/4.54.0
Content-Length: 1043
Content-Type: application/json
Accept-Encoding: gzip

{
 "backfillAll": null,
 "backfillNone": {},
 "destinationConfig": {
  "bigqueryDestinationConfig": {
   "dataFreshness": "900s",
   "sourceHierarchyDatasets": {
    "datasetTemplate": {
     "location": "us-central1"
    }
   }
  },
  "destinationConnectionProfile": "..."
 },
 "displayName": "Postgres to BigQuery",
 "sourceConfig": {
  "mysqlSourceConfig": null,
  "oracleSourceConfig": null,
  "postgresqlSourceConfig": {
   "includeObjects": {
    "postgresqlSchemas": [
     {
      "postgresqlTables": [
       {
        "postgresqlColumns": [
         {
          "column": "id",
          "primaryKey": true
         },
         {
          "column": "name"
         },
         {
          "column": "address"
         }
        ],
        "table": "buildings"
       },
       {
        "table": "floors"
       },
       {
        "table": "nodes"
       },
       {
        "table": "radiators"
       },
       {
        "table": "spaces"
       },
       {
        "table": "units"
       },
       {
        "table": "visits"
       }
      ],
      "schema": "public"
     }
    ]
   },
   "maxConcurrentBackfillTasks": 12,
   "publication": "test_publication",
   "replicationSlot": "test_replication"
  },
  "sourceConnectionProfile": "..."
 },
 "state": "RUNNING"
}
cristopher-rodrigues commented 1 year ago

@edwardmedia @davidkhardwick this is what I get when using gcloud:

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "code": 13,
      "message": "An unknown error has occurred"
    },
    "authenticationInfo": {
      "principalEmail": "email",
      "principalSubject": "user:email"
    },
    "requestMetadata": {
      "requestAttributes": {},
      "destinationAttributes": {}
    },
    "serviceName": "datastream.googleapis.com",
    "methodName": "google.cloud.datastream.v1.Datastream.CreateStream",
    "resourceName": "project",
    "resourceLocation": {
      "currentLocations": [
        "us-central1"
      ]
    }
  },
  "insertId": "id",
  "resource": {
    "type": "audited_resource",
    "labels": {
      "method": "google.cloud.datastream.v1.Datastream.CreateStream",
      "service": "datastream.googleapis.com",
      "project_id": "projectid"
    }
  },
  "timestamp": "2023-03-01T18:45:12.357362572Z",
  "severity": "ERROR",
  "logName": "projects/projectid/logs/cloudaudit.googleapis.com%2Factivity",
  "operation": {
    "id": "projects/projectid/locations/us-central1/operations/operation-2-5f5db1ca53242-6b60b33a-7800af10",
    "producer": "datastream.googleapis.com",
    "last": true
  },
  "receiveTimestamp": "2023-03-01T18:45:12.656614692Z"
}
gcloud datastream streams create test \
    --location=us-central1 \
    --display-name=my-stream \
    --source=source_test_connection_profile_id2 \
    --postgresql-source-config=source_config.json \
    --destination=destination_test_connection_profile_id2 \
    --bigquery-destination-config=destination_config.json \
    --backfill-none

source_config.json

{
   "includeObjects": {
    "postgresqlSchemas": [
     {
      "postgresqlTables": [
       {
        "postgresqlColumns": [
         {
          "column": "id",
          "primaryKey": true
         },
         {
          "column": "name"
         },
         {
          "column": "address"
         }
        ],
        "table": "buildings"
       }
      ],
      "schema": "public"
     }
    ]
   },
   "publication": "test_publication",
   "replicationSlot": "test_replication"
  }

destination_config.json

{
  "dataFreshness": "900s",
  "sourceHierarchyDatasets": {
    "datasetTemplate": {
      "location": "us-central1"
    }
  }
}

😭😭😭😭😭

cristopher-rodrigues commented 1 year ago

I also noticed that it is not possible to specify Primary keys through the GCP (stream) console UI 🤦

edwardmedia commented 1 year ago

@cristopher-rodrigues I really do not know what the api an internal error has occurred mean in this case. Are you able to run the example here? Does that mean you have had the answer? If you can create the resource by other tools like gcloud while the terraform can't, we definitely can help take a look. Otherwise, I would think the ask might not be supported at this moment?

cristopher-rodrigues commented 1 year ago

@cristopher-rodrigues I really do not know what the api an internal error has occurred mean in this case. Are you able to run the example here? Does that mean you have had the answer? If you can create the resource by other tools like gcloud while the terraform can't, we definitely can help take a look. Otherwise, I would think the ask might not be supported at this moment?

Thank you for your reply @edwardmedia. In the end, it turned out that TF says we can specify primary keys, but it does not work if you specify, and also, on GCP, I can't locate how to add primary keys: in the dashboard, docs, gcloud, etc.

edwardmedia commented 1 year ago

@cristopher-rodrigues I do see primaryKey is available in the API . Trying to verify whether it is an API issue or a provider issue, but I have hard-time to repro at this moment

https://cloud.google.com/datastream/docs/reference/rest/v1/projects.locations.streams#PostgresqlColumn