transferwise / pipelinewise-target-redshift

Singer.io Target for Amazon Redshift - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
Other
12 stars 65 forks source link

Meltano Recharge Issue #182

Open NeilGorman104 opened 2 years ago

NeilGorman104 commented 2 years ago

Recharge Issue Summary: Transferwise

Meltano Version 1.102.0

Linux System

Redshift Database

tap recharge repo: link

target redshift (transferwise): link

infinity pipelines

Reproducing error:

Run pipeline into Redshift table (manually)

The recharge pipeline currently is facing two problems:

it occasionally will not finish running and will seemingly run infinitely, loading no data into redshift

it only loads 50 rows into redshift no matter what I set the batch size at

The batch_size_rows column currently is configured to 1000, but it has loaded 100 rows twice when the batch_size_rows is set to 100, but has also only loaded 50 rows at this setting as well

This is while using the following components:

Extractor:

name: dev-tap-recharge-subscriptions

inherits from: tap-recharge-subscriptions

Loader:

name: target-redshift-pipelinewise-recharge

variant: transferwise

inherits from: target-redshift

Pipelines:

name: dev-recharge-pipelinewise

1) Will not finish running and will run infinitely

When the pipeline runs infinitely, “INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 1.217686414718628, "tags": {"http_status_code": 200, "status": "succeeded"}} cmd_type=extractor job_id=dev-recharge-pipelinewise name=dev-tap-recharge-subscriptions run_id=147b6430-da73-4bae-8eb9-26e5aaa733de stdio=stderr" will be produced continuously. The pipeline will never finish, and no data will ever be loaded on redshift. Occasionally, after breaking the pipeline manually and running it again, the pipeline will immediately run in a short (<20 seconds) time, and data will be loaded into redshift successfully. There is no noticeable pattern for why this only sometimes happens, and I haven’t been able to figure out what causes this to occur.

2) Only loads 50 rows into redshift no matter what it is set to The second problem with the pipeline is that it only loads 50 rows into redshift no matter what I set the batch size at. The batch_size_rows column currently is configured to 1000, but it has loaded 100 rows twice when the batch_size_rows is set to 100, but has also only loaded 50 rows at this setting as well.

Troubleshooting Techniques tried Ways I’ve tried to troubleshoot these issues

Transferwise/Pipelinewise Variant Loader(s):

Creating and configuring a new loader using transferwise to load recharge data

Manually configuring the datamill loader in meltano.yml

Setting the replication key (updated_at) to ascending OR descending in streams.py

Creating a state file for the pipeline to follow off of/Copying a successful state file from a working pipeline to use for recharge pipeline

Documented Conversations Conversation link

Location

topics discussed

CRITICAL cursor already closed / connection already closed · Issue #48 · datamill-co/target-redshift

Github Issue - target-Redshift (datamill)

SSL connection has been closed unexpectedly

https://meltano.slack.com/archives/C01TCRBBJD7/p1649777577117929 - Connect to preview

Slack

SSL connection has been closed unexpectedly

https://meltano.slack.com/archives/C01UTUSP34M/p1654122008077529 - Connect to preview

Slack

Batch Size & SSL Connection