rudderlabs / rudder-transformer

Open-source, warehouse-first Customer Data Pipeline and Segment-alternative. Collects and routes clickstream data and builds your customer data lake on your data warehouse.
https://www.rudderstack.com
Other
84 stars 111 forks source link

Postgres Connector: Too many columns outputted from the event #466

Closed mrarlloyd closed 3 years ago

mrarlloyd commented 3 years ago

Hi,

I'm struggling to get Postgres working as a destination, even for very small tracked events i get a 'Too many columns outputted from the event'. Is there a recommended transformer? Or is there a way to store traits as JSON instead?

Many thanks,

Alex

ryanmccrary commented 3 years ago

Hey @mrarlloyd, do you have an example of the call you're making or an event payload you could provide?

mrarlloyd commented 3 years ago

Thanks @ryanmccrary, after digging into this, i think it's to do with identify() calls for specific users. However... i can't confirm that as i think i have a more fundamental problem which i cannot diagnose either.

When tap 'Test destination' of my Postgres destination i see the following result:

InstanceId Status Message
analytics-rudderstack-0 failed AccessDenied: Access Denied status code: 403, request id: 7F67FD13855BE9EC, host id: zOKV+Cip6y43oc7hr94dnyaMnjwhdJmCQVo176rH7DZfpc3HOkBqEngy2V+VrGF0b+EEZU6JIYI=

I can see that the s3 bucket gets correctly populated with a rudder-test-payload & contents, and i can see in the Postgres server logs that the user account is correctly authorising on the server. Also, as far as i can tell i've GRANTed the correct permissions for the user, and i've tried running the test as the PG Admin user too.

I don't understand the results of the test above to be able to diagnose further.

Looking at the Live Events there's lots of payloads that look like:

{
    "error": "{\"generating_load_files_failed\": {\"errors\": [\"No load files generated\"], \"attempt\": 1}}",
    "failed_table_uploads": [
        "identifies",
        "job_application",
        "rudder_discards",
        "tracks",
        "user_created",
        "user_updated",
        "users"
    ],
    "lastSyncedAt": "2021-02-11T12:30:39.329072Z",
    "successful_table_uploads": [],
    "uploadID": 215
}

I'm at a bit of a loss as to what to check next...!

gane5hvarma commented 3 years ago

Hey @mrarlloyd. From the test destination result. It says that rudder-server doesn't have permission to upload to s3. Can you once verify from your side, if the keys are correct? Another thing can you check if the time of the contents in rudder-test-payload matches with when you click on Test destination. What is the version of rudder-server are you using?

mrarlloyd commented 3 years ago

Ah s3...! Sorry to have gotten side tracked from the original issue, but you're absolutely right. Now you've told me what that test destination result actually means. I'd misconfigured my s3 user's permissions so it could write to the specific bucket, but not read from it. 🙄 Thank you for that clarification... i wish i could have interpreted that test result myself.


So now i definitely have an issue with a certain identify() call being rejected because there's too many traits, is there a recommended transformation i could add for this destination? Or the option to store additional traits as a JSON field?

SumanthPuram commented 3 years ago

@mrarlloyd Are you running the data plane yourself or using our hosted service?

https://github.com/rudderlabs/rudder-transformer/blob/9719aba4fedf450315d7c2beaf466b2b8e3974c0/warehouse/index.js#L22

This is the line that restricts the number of columns. This is intended to filter out some erroneous events, that end up hitting the limits of warehouse columns. The environment variable needs to be configured to increase the limit for expected events like your case.