singer-io / getting-started

This repository is a getting started guide to Singer.
https://singer.io
1.26k stars 148 forks source link

Schema Feature Request: Additional inclusion type #27

Open romeromark opened 7 years ago

romeromark commented 7 years ago

To my knowledge, Schema properties can have one of two inclusion types automatic and available. The automatic type means the property is always outputted regardless if the user explicitly deselects the value in a properties file, whereas properties marked as available are included only if user explicitly selects in the properties file.

I would like to suggest that a third type be added in which the property is automatically outputted but allows for the user to deselect the property in the properties file. My primary motivation is that I'm building a tap for a CRM that allows for custom fields to be added by the user. The custom fields are automatically exposed by the API in a meta endpoint and in the GET endpoint. I'm using the discover ability of Singer to build the schema dynamically and would like the ability to automatically include any new properties by default in the tap output without having to update the properties file.

To this end my tap currently looks for a custom property in each property definition to check to see if the value should outputted if the inclusion value is set to available and there is no further guidance in the properties file. I've attached a snippet of code that handles this. Ideally I think a third value in the inclusion property is the best path forward, but I did not want to introduce a value that was outside the specification for compatibility reasons.

def should_sync(discovered_schema, annotated_schema, field):
    if discovered_schema['properties'][field].get('inclusion') == 'automatic':
        return True

    if discovered_schema['properties'][field].get('inclusion') == 'unsupported':
        return False

    if field in annotated_schema['properties']:
        return annotated_schema['properties'][field].get('selected')

    return discovered_schema['properties'][field].get('inclusion_default', False)