singer-io / singer-python

Writes the Singer format from Python
https://singer.io
Apache License 2.0
538 stars 128 forks source link

Add `default` key to Schema #149

Closed samjewell closed 2 years ago

samjewell commented 3 years ago

Description of change

The default keyword is supported in JSON Schema, as described here: https://json-schema.org/understanding-json-schema/reference/generic.html#annotations

So any data for defaults within a Catalog file should be accessible through the Schema class.

Manual QA steps

A Catalog file which looks like this:

{
  "streams": [
    {
      "stream": "cortex_distributor_writes",
      "tap_stream_id": "cortex_distributor_writes",
      "schema": {
        "properties": {
          "namespace": {
            "type": "string",
            "default": "prod"
          }
        }
      }
    }
  ]
}

Should allow the default key to be accessed through: singer.Catalog.load(path_to_catalog_file).streams[0].schema.to_dict()["properties"]["namespace"]["default"]

Risks

unknown

Rollback steps

cmerrick commented 3 years ago

Hi @samjewell, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

cmerrick commented 3 years ago

You did it @samjewell!

Thank you for signing the Singer Contribution License Agreement.

samjewell commented 2 years ago

@leslievandemark @zachharris1 @jacobrobertbaca @dmosorast @cosimon Is there any change I can have this work reviewed and merged? This is a pretty small and low-impact change, but we're experiencing some small dependency-hell currently while this is unmerged.

dmosorast commented 2 years ago

Hi @samjewell What use case is this attempting to solve? The reason that a default keyword has not been included is because the tap has no control over what the default value is for a particular field (it's outside of the Singer philosophy to modify the data in this way, as it is a transformation that changes the actual raw data). As far as I can tell, it doesn't have any validation impact on the Schema, and appears to be targeted at a different use case of JSON Schema.

That said, I'm curious what your use case here is, and also if you could elaborate on the dependency hell that you're experiencing. Keeping virtual environments for tap and target installs have kept dependencies under control for myself, so I'm interested if there's a new way of using the library that throws a wrench into that approach. Cheers!

samjewell commented 2 years ago

@dmosorast just a short initial message to say: thanks for your thoughts- that’s very helpful: both about the singer philosophy and also the dependencies approach.

I may follow up again soon to share our use-case, but my thinking right now is that we are including a Transform step when we shouldn’t be. So we may instead ditch this PR, and apply our transforms downstream of the singer target, with DBT or something

dmosorast commented 2 years ago

@samjewell Sounds good! Glad it was helpful. No pressure on the use case, mostly my curiosity. I'm available on the Singer Slack if you end up wanting to chat about at some point in the future.

Feel free to close this if/when ready.