singer-io / singer-python

Writes the Singer format from Python
https://singer.io
Apache License 2.0
538 stars 128 forks source link

5.9.0: Transformer.filter_data_by_metadata() doesn't filter unselected nodes where selected unspecified #120

Closed macleodmac closed 4 years ago

macleodmac commented 4 years ago

Hi,

I'm attempting to write a tap, using singer.transform.Transformer.filter_data_by_metadata in order to filter the data.

    def filter_data_by_metadata(self, data, metadata):
        if isinstance(data, dict) and metadata:
            for field_name in list(data.keys()):
                selected = singer.metadata.get(metadata, ('properties', field_name), 'selected')
                inclusion = singer.metadata.get(metadata, ('properties', field_name), 'inclusion')
                if inclusion == 'automatic':
                    continue

                if selected is False:
                    data.pop(field_name, None)
                    # Track that a field was filtered because the customer
                    # didn't select it.
                    self.filtered.add(field_name)

                if inclusion == 'unsupported':
                    data.pop(field_name, None)
                    # Track that the field was filtered because the tap
                    # declared it as unsupported.
                    self.filtered.add(field_name)

        return data

This logic has two problems:

The expected behaviour would be the following:

KAllan357 commented 4 years ago

There's a function in the utils module: https://github.com/singer-io/singer-python/blob/master/singer/utils.py#L237 should_sync_field. I think that the logic between this change and that function should be in agreement.