jwills / target-duckdb

A Singer.io target for DuckDB
Other
17 stars 12 forks source link

Target does not correctly handle `anyOf` schema definition #18

Closed kgpayne closed 1 year ago

kgpayne commented 1 year ago

Describe the bug Given the following jsonschema property definition:

      "Status": {
        "anyOf": [
          { "type": "null" },
          {
            "type": "object",
            "properties": {
              "Ended": { "type": "boolean" },
              "GotBidders": { "type": "boolean" },
              "GotWinner": { "type": "boolean" }
            },
            "required": ["Ended", "GotBidders", "GotWinner"]
          }
        ]
      },

The logic for flattening indexes in the flatten_schema function selects v.values()[0][0] here, which in the above case case is { "type": "null" }. This case is unhandled.

This bug means that any properties declaring an anyOf clause in its property definition are not guaranteed to be interpreted correctly, and in my case are excluded completely.

It's worth noting that I created the schema using the genson python tool, which is pretty common amongst tap developers. This likely means may taps (especially more recent ones) may use anyOf in this order.

To Reproduce Steps to reproduce the behavior:

  1. Send SCHEMA message that defines an anyOf property where the first element is { "type": "null" } to target-duckdb
  2. Send a corresponding RECORD message with non-null data conforming to the above schema.
  3. Observe that no corresponding column is created in duckdb.

Expected behavior Ideally target-duckdb would iterate until the first non-null type anyOf, rather than assuming the first value will be the most likely intended type.

Screenshots If applicable, add screenshots to help explain your problem.

Your environment

Additional context Add any other context about the problem here.

jwills commented 1 year ago

Ack, lame-- thanks for the bug report, will get it fixed up this week!