Open cristopher-rodrigues opened 1 year ago
@cristopher-rodrigues are you able to achieve this with other tools like gcloud? If yes, what is the full command?
@cristopher-rodrigues are you able to achieve this with other tools like gcloud? If yes, what is the full command?
Hi @edwardmedia,
Unfortunately, we only use GCP. Here is our full config:
resource "google_datastream_connection_profile" "source" {
display_name = "Postgresql Source"
location = "us-central1"
connection_profile_id = "source_test_connection_profile_id"
postgresql_profile {
hostname = google_sql_database_instance.test.public_ip_address
username = google_sql_user.users.name
password = google_sql_user.users.password
database = google_sql_database.database.name
}
}
resource "google_datastream_connection_profile" "destination" {
display_name = "BigQuery Destination"
location = "us-central1"
connection_profile_id = "destination_test_connection_profile_id"
bigquery_profile {}
}
resource "google_datastream_stream" "test" {
display_name = "Postgres to BigQuery"
location = "us-central1"
stream_id = "test_stream_id"
desired_state = "RUNNING"
source_config {
source_connection_profile = google_datastream_connection_profile.source.id
postgresql_source_config {
max_concurrent_backfill_tasks = 12
publication = "test_publication"
replication_slot = "test_replication"
include_objects {
postgresql_schemas {
schema = "public"
postgresql_tables {
table = "will_sync"
postgresql_columns {
column = "id"
primary_key = true
}
postgresql_columns {
column = "name"
}
}
}
}
exclude_objects {
postgresql_schemas {
schema = "public"
postgresql_tables {
table = "wont_sync"
postgresql_columns {
column = "column"
}
}
}
}
}
}
destination_config {
destination_connection_profile = google_datastream_connection_profile.destination.id
bigquery_destination_config {
data_freshness = "900s"
source_hierarchy_datasets {
dataset_template {
location = "us-central1"
}
}
}
}
backfill_all {}
}
and more outputs:
google_datastream_connection_profile.destination: Creating...
google_datastream_connection_profile.source: Creating...
google_datastream_connection_profile.destination: Creation complete after 3s [id=projects/kelvin-datascience-staging/locations/us-central1/connectionProfiles/destination_test_connection_profile_id]
google_datastream_connection_profile.source: Still creating... [10s elapsed]
google_datastream_connection_profile.source: Creation complete after 14s [id=projects/kelvin-datascience-staging/locations/us-central1/connectionProfiles/source_test_connection_profile_id]
google_datastream_stream.test: Creating...
google_datastream_stream.test: Still creating... [10s elapsed]
╷
│ Error: Error waiting to create Stream: Error waiting for Creating Stream:
│
│ with google_datastream_stream.test,
│ on datastream.tf line 22, in resource "google_datastream_stream" "test":
│ 22: resource "google_datastream_stream" "test" {
One workaround that actually works is: to specify multiple postgresql_columns
in the exclude_objects
It turned out that the same bug occurs when a primary_key is specified 😭
are you able to achieve this with other tools like gcloud?
@cristopher-rodrigues - I think @edwardmedia was asking if you also get the same error when using the gcloud commands https://cloud.google.com/sdk/gcloud/reference/datastream/streams/create
are you able to achieve this with other tools like gcloud?
@cristopher-rodrigues - I think @edwardmedia was asking if you also get the same error when using the gcloud commands https://cloud.google.com/sdk/gcloud/reference/datastream/streams/create
Oh, got it. Unfortunately, I couldn't find a way to add PKs (like TF doc suggests, for PostgreSQL at least) to datastream via the SDK. For multiple columns, we should be fine as long as exclude columns
works.
@edwardmedia this is the request that fails:
POST /v1/projects/project/locations/us-central1/streams?alt=json&streamId=test_stream_id2 HTTP/1.1
Host: datastream.googleapis.com
User-Agent: Terraform/1.3.9 (+https://www.terraform.io) Terraform-Plugin-SDK/2.10.1 terraform-provider-google/4.54.0
Content-Length: 1043
Content-Type: application/json
Accept-Encoding: gzip
{
"backfillAll": null,
"backfillNone": {},
"destinationConfig": {
"bigqueryDestinationConfig": {
"dataFreshness": "900s",
"sourceHierarchyDatasets": {
"datasetTemplate": {
"location": "us-central1"
}
}
},
"destinationConnectionProfile": "..."
},
"displayName": "Postgres to BigQuery",
"sourceConfig": {
"mysqlSourceConfig": null,
"oracleSourceConfig": null,
"postgresqlSourceConfig": {
"includeObjects": {
"postgresqlSchemas": [
{
"postgresqlTables": [
{
"postgresqlColumns": [
{
"column": "id",
"primaryKey": true
},
{
"column": "name"
},
{
"column": "address"
}
],
"table": "buildings"
},
{
"table": "floors"
},
{
"table": "nodes"
},
{
"table": "radiators"
},
{
"table": "spaces"
},
{
"table": "units"
},
{
"table": "visits"
}
],
"schema": "public"
}
]
},
"maxConcurrentBackfillTasks": 12,
"publication": "test_publication",
"replicationSlot": "test_replication"
},
"sourceConnectionProfile": "..."
},
"state": "RUNNING"
}
@edwardmedia @davidkhardwick this is what I get when using gcloud
:
{
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {
"code": 13,
"message": "An unknown error has occurred"
},
"authenticationInfo": {
"principalEmail": "email",
"principalSubject": "user:email"
},
"requestMetadata": {
"requestAttributes": {},
"destinationAttributes": {}
},
"serviceName": "datastream.googleapis.com",
"methodName": "google.cloud.datastream.v1.Datastream.CreateStream",
"resourceName": "project",
"resourceLocation": {
"currentLocations": [
"us-central1"
]
}
},
"insertId": "id",
"resource": {
"type": "audited_resource",
"labels": {
"method": "google.cloud.datastream.v1.Datastream.CreateStream",
"service": "datastream.googleapis.com",
"project_id": "projectid"
}
},
"timestamp": "2023-03-01T18:45:12.357362572Z",
"severity": "ERROR",
"logName": "projects/projectid/logs/cloudaudit.googleapis.com%2Factivity",
"operation": {
"id": "projects/projectid/locations/us-central1/operations/operation-2-5f5db1ca53242-6b60b33a-7800af10",
"producer": "datastream.googleapis.com",
"last": true
},
"receiveTimestamp": "2023-03-01T18:45:12.656614692Z"
}
gcloud datastream streams create test \
--location=us-central1 \
--display-name=my-stream \
--source=source_test_connection_profile_id2 \
--postgresql-source-config=source_config.json \
--destination=destination_test_connection_profile_id2 \
--bigquery-destination-config=destination_config.json \
--backfill-none
source_config.json
{
"includeObjects": {
"postgresqlSchemas": [
{
"postgresqlTables": [
{
"postgresqlColumns": [
{
"column": "id",
"primaryKey": true
},
{
"column": "name"
},
{
"column": "address"
}
],
"table": "buildings"
}
],
"schema": "public"
}
]
},
"publication": "test_publication",
"replicationSlot": "test_replication"
}
destination_config.json
{
"dataFreshness": "900s",
"sourceHierarchyDatasets": {
"datasetTemplate": {
"location": "us-central1"
}
}
}
😭😭😭😭😭
I also noticed that it is not possible to specify Primary keys through the GCP (stream) console UI 🤦
@cristopher-rodrigues I really do not know what the api an internal error has occurred
mean in this case. Are you able to run the example here? Does that mean you have had the answer? If you can create the resource by other tools like gcloud while the terraform can't, we definitely can help take a look. Otherwise, I would think the ask might not be supported at this moment?
@cristopher-rodrigues I really do not know what the api
an internal error has occurred
mean in this case. Are you able to run the example here? Does that mean you have had the answer? If you can create the resource by other tools like gcloud while the terraform can't, we definitely can help take a look. Otherwise, I would think the ask might not be supported at this moment?
Thank you for your reply @edwardmedia. In the end, it turned out that TF says we can specify primary keys, but it does not work if you specify, and also, on GCP, I can't locate how to add primary keys: in the dashboard, docs, gcloud, etc.
@cristopher-rodrigues I do see primaryKey is available in the API . Trying to verify whether it is an API issue or a provider issue, but I have hard-time to repro at this moment
When I try:
it just crashes...
Google API Response Details