TLDR; The static schemas in this tap have aged, and seem to be missing a lot of (potentially important data). If you are missing data, this is how you can verify your change is working before submitting a PR.
The Issue
Singer-python logs fields not present in the schema as they are transformed in this sort of fashion:
2019-06-21 21:01:14,459Z tap - WARNING Removed 10 paths during transforms:
2019-06-21 21:01:14,459Z tap - advertisingChannelSubType
2019-06-21 21:01:14,459Z tap - biddingStrategyConfiguration
2019-06-21 21:01:14,460Z tap - budget
2019-06-21 21:01:14,460Z tap - campaignGroupId
2019-06-21 21:01:14,460Z tap - finalUrlSuffix
2019-06-21 21:01:14,460Z tap - forwardCompatibilityMap
2019-06-21 21:01:14,460Z tap - selectiveOptimization
2019-06-21 21:01:14,460Z tap - trackingUrlTemplate
2019-06-21 21:01:14,461Z tap - WARNING Removed paths list: ['advertisingChannelSubType', 'biddingStrategyConfiguration', 'budget', 'campaignGroupId', 'finalUrlSuffix', 'forwardCompatibilityMap', 'selectiveOptimization', 'trackingUrlTemplate']
This is done to ensure the quality of the data from the source by requiring an explicit data typing to be specified in the schema. Data typing changes are challenging for ETL, since the destination would need to be backfilled with historical data that is typed correctly.
The Ask
It's challenging to find representative examples for every field that can come through this tap, due to the nature of ads. So, if you see fields removed in this fashion and want to get the data, please submit a PR with the information below included, so that we can verify the correctness and successfully merge them.
The Process
A schema change is deceptively simple (given that it's just a JSON schema), but this can cause real problems down the line if not verified correctly. Including this information with a PR helps us verify that the data is correct using the tools Singer has to offer.
Run the tap using target-stitch in verify mode, without a state file. (e.g., /path/to/env/tap-adwords --config config.json --catalog catalog.json | /path/to/env/target-stitch --config config.json --dry-run -o target-output.json).
The --dry-run flag will add local schema validation while not posting to Stitch and -o will save the records locally to the specified file for manual inspection.
Not using a state file will ensure that enough data has passed through to catch any issues with inconsistent data types.
To select the stream you want, you can use this tool with the output of /path/to/env/tap-adwords --config config.json --discover
Open a PR with the Schema change on this repo with the following information:
A log snippet (~20 lines) with a "Removed paths" log line with and without the field removed
A log snippet with a "Batch is valid" message from the target (e.g., 2019-06-21 21:01:13,527Z target - INFO ad_groups (2345): Batch is valid
(Optional) A redacted RECORD message from the output file containing non-null data for the intended field(s). For information security's sake, do not post clear data or PII on Github. Adwords will frequently return null in the event of a lack of data, so seeing actual data come through is a HUGE help.
This is a new idea, so the requirements might change as we learn what is needed to ensure correctness, but this should be a good start. Thank you for helping us improve SInger
TLDR; The static schemas in this tap have aged, and seem to be missing a lot of (potentially important data). If you are missing data, this is how you can verify your change is working before submitting a PR.
The Issue
Singer-python logs fields not present in the schema as they are transformed in this sort of fashion:
This is done to ensure the quality of the data from the source by requiring an explicit data typing to be specified in the schema. Data typing changes are challenging for ETL, since the destination would need to be backfilled with historical data that is typed correctly.
The Ask
It's challenging to find representative examples for every field that can come through this tap, due to the nature of ads. So, if you see fields removed in this fashion and want to get the data, please submit a PR with the information below included, so that we can verify the correctness and successfully merge them.
The Process
A schema change is deceptively simple (given that it's just a JSON schema), but this can cause real problems down the line if not verified correctly. Including this information with a PR helps us verify that the data is correct using the tools Singer has to offer.
/path/to/env/tap-adwords --config config.json --catalog catalog.json | /path/to/env/target-stitch --config config.json --dry-run -o target-output.json
).--dry-run
flag will add local schema validation while not posting to Stitch and-o
will save the records locally to the specified file for manual inspection./path/to/env/tap-adwords --config config.json --discover
2019-06-21 21:01:13,527Z target - INFO ad_groups (2345): Batch is valid
RECORD
message from the output file containing non-null data for the intended field(s). For information security's sake, do not post clear data or PII on Github. Adwords will frequently returnnull
in the event of a lack of data, so seeing actual data come through is a HUGE help.This is a new idea, so the requirements might change as we learn what is needed to ensure correctness, but this should be a good start. Thank you for helping us improve SInger