singer-io / tap-hubspot

GNU Affero General Public License v3.0
52 stars 95 forks source link

Unable to remove properties from the deals schema #119

Open dkarzon opened 4 years ago

dkarzon commented 4 years ago

I am trying to setup a hubspot tap with a postgres target and I keep getting an error about postgres trying to create a table with more than 1600 columns in it. Even though at the time my schema only had 4 properties in it.

However going through the code it looks like if deals is selected as a stream the schema is automatically applied from the json output of this api call https://api.hubapi.com/properties/v1/deals/properties See code here: https://github.com/singer-io/tap-hubspot/blob/master/tap_hubspot/__init__.py#L191

Is there a way to modify he computed schema for the deals stream at all to remove the properties that I don't need? I haven't been able to find a way to do that at the moment.

My deals schema:

{
"streams": [
    {
        "stream": "deals",
        "tap_stream_id": "deals",
        "key_properties": ["dealId"],
        "schema": {
            "type": "object",
            "properties": {
                "portalId": {
                    "type": [
                        "null",
                        "integer"
                    ]
                },
                "dealId": {
                    "type": [
                        "null",
                        "integer"
                    ]
                },
                "dealname": {
                    "type": [
                        "null",
                        "string"
                    ]
                },
                "dealstage": {
                    "type": [
                        "null",
                        "string"
                    ]
                }
            }
        },
        "metadata": [
            {
                "breadcrumb": [ ],
                "metadata": {
                    "selected": true,
                    "table-key-properties": [
                        "dealId"
                    ],
                    "forced-replication-method": "INCREMENTAL",
                    "valid-replication-keys": [
                        "hs_lastmodifieddate"
                    ]
                }
            }
        ]
    }
]}
zyanichaimaa commented 4 years ago

have you fixed your problem ?

gmontanola commented 4 years ago

The same is happening to me. Anyone had any luck with this?

briansloane commented 4 years ago

Are you using the catalog to choose the fields that you want via selected metadata? That should allow you to limit the fields that get emitted even though the schema has all the properties in it.

gmontanola commented 4 years ago

Yes, I'm! I've only selected like 4-5 properties using "selected: true" and the others are explicitly set to false.

gmontanola commented 4 years ago

Well, I've done some testing and:

  1. The schema is generated using all the available properties for an object (and not the selected ones as @dkarzon)
  2. The 1600 column limit is reached because a property is an object with 4 keys (value, timestamp, source, sourceId) and this results in 4 new columns per property.
staufman commented 3 years ago

In case anyone else is hitting this, it's a real bummer. I'm not intimately familiar with the code in this repo but for now, I went into tap_hubspot/__init__.py (locally) to line 149 and changed it from if extras: to if False and extras:.

Yes, it's a hack and yes, I don't quite understand the ramifications of not syncing the extra data associated with properties. At the same time, it prevents the explosion of columns needed to pipe the data in Postgres which might be all people need.