meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
87 stars 64 forks source link

bug: Schema warning with `AnyType` #2519

Open ReubenFrankel opened 5 days ago

ReubenFrankel commented 5 days ago

Singer SDK Version

0.34.1

Is this a regression?

Python Version

3.7 (EOL)

Bug scope

Taps (catalog, state, etc.)

Operating System

Ubuntu 22.04

Description

Unsure if I am using this correctly, but my understanding is that AnyType should represent a value of any JSON type (i.e. string, integer, object, array or null). Maybe this has been fixed in a later version, but 0.34.1 is the latest compatible version available since we haven't yet dropped Python 3.7 support for the tap.

Code

th.Property("value", th.AnyType)
Could not append type because the JSON schema for the dictionary `{}` appears to be invalid
ReubenFrankel commented 4 days ago

This seems to be the intended behaviour when using AnyType: https://github.com/meltano/sdk/blob/2874a15abefff83a994dc0e3b2805f660e0e0bf6/tests/core/test_jsonschema_helpers.py#L183-L186

Not sure why this is the case. The main issue here is the amount of times this warning message seems to get thrown out in the logs. For example:

$ meltano invoke tap-capsulecrm 2>&1 | grep -c 'Could not append type'

237
edgarrmondragon commented 4 days ago

This seems to be the intended behaviour when using AnyType:

https://github.com/meltano/sdk/blob/2874a15abefff83a994dc0e3b2805f660e0e0bf6/tests/core/test_jsonschema_helpers.py#L183-L186

Not sure why this is the case. The main issue here is the amount of times this warning message seems to get thrown out in the logs. For example:

$ meltano invoke tap-capsulecrm 2>&1 | grep -c 'Could not append type'

237

Looking at the code, my first guess is that it's because the schema property is not cached, so the PropertiesList gets serialized a bunch of times: https://github.com/Matatika/tap-capsulecrm/blob/f7c360329868a70c445d73943887174f07231a3b/tap_capsulecrm/client.py#L73-L75

ReubenFrankel commented 4 days ago

Thanks, that reduces the amount of warnings significantly (once for each stream with a schema referencing AnyType):

$ meltano invoke tap-capsulecrm 2>&1 | grep -c 'Could not append type'

5

I guess this is then more a matter of should the SDK throw that warning message for explicit usage of AnyType?

edgarrmondragon commented 2 days ago

Thanks, that reduces the amount of warnings significantly (once for each stream with a schema referencing AnyType):

$ meltano invoke tap-capsulecrm 2>&1 | grep -c 'Could not append type'

5

I guess this is then more a matter of should the SDK throw that warning message for explicit usage of AnyType?

Problem is AnyType is just a wrapper for a "<prop>": {} schema element, and that's the same warning you'd get. I increasingly dislike the append_type implementation. I'd rather it be part of the JSONTypeHelper base class. So this:

Property("name", StringType(nullable=True))  # {"properties": {"name": {"type": ["string", "null"]}}, "required": []}

instead of

Property("name", StringType, required=False)  # {"properties": {"name": {"type": ["string", "null"]}}, "required": []}

That'd allow us to refactor these lines

https://github.com/meltano/sdk/blob/dfdbdde32bd92fe11b1c3c44c25398db8362bed6/singer_sdk/typing.py#L688-L689

and get rid of most of the instances of that warning.