meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
100 stars 70 forks source link

feat: discover and list streams when printing `--about --format=markdown` #1337

Open tayloramurphy opened 1 year ago

tayloramurphy commented 1 year ago

From what I've seen it seems like the readme that comes with new taps built on the SDK don't have a section to detail their available streams.

https://github.com/MeltanoLabs/tap-messagebird is an example where it'd be nice if we detailed the streams available.

z3z1ma commented 1 year ago

Must account for many taps with completely dynamic streams such as database taps, SaaS apps with customization (ie Salesforce), and so on. Only possible with static taps. There should be an indication of this on the README that suggests the author instead detail the logic which comprises the stream name.

aaronsteers commented 1 year ago

The Hub workflow does this now. In theory can recycle code from that.

The challenge, as @z3z1ma alludes to, is that printing the stream list requires also running discovery. For some subset of taps, this will complete very quickly, since the stream list is known statically. However, there's a class of streams which will fail the --about information for lack of connection info and a third class of streams which, even if connection info is provided in --config at the same time as --about is called, the --about operation will take several minutes or more to scan the metadata for all of the streams.

Additionally, there's a implicit expectation that --about will finish quickly and that running --about with different --format options fundamentally will be performance a more-or-less equivalent operation.

A possible compromise here would be to support --discover --format=markdown, which would run discovery and output the result as an additional markdown blurb which could be appended to the main --about contents. This keeps the performance profile stable across both, while still letting developers streamline the readme creation process.

aaronsteers commented 1 year ago

Challanges with relying on the tap's README.md:

Many README.md files today become stale because the process of manually updating the (i.e. rerunning --about --format=markdown) is not very friendly to a CI-central management workflow where multiple contributors are providing code updates and new features.

I've started leaning towards an alternative where the Hub itself would scan, collect and publish the detected stream list (when available), similar as to how we publish the known settings and commands. This could then be kept up to date on a defined cadence (or in response to new versions being published) without requiring commits and updates to all of the taps' README files. Otherwise, the list of streams on the taps' README files does slowly go stale as new streams are added.

One benefit of putting this info on the Hub (besides the ability to centrally update it based on periodic scanning), is that tap authors would have increased motivation to refer their users to our Hub.

Today's cookiecutter includes some bolierplate so that if the developer doesn't add the settings list, the user can still discover it:

Accepted Config Options

Developer TODO: Provide a list of config options accepted by the target.

A full list of supported settings and capabilities for this tap is available by running:

target-snowflake --about

Taking this further, the developer can offload both to us when their settings, streams, commands, and streams are discoverable:

Accepted Config Options

A full list of supported settings and capabilities for this tap is available on MeltanoHub:

Supported Stream Types

A full list of supported streams for this tap is available on MeltanoHub:

This gives MeltanoHub additional traffic, while also reducing maintenance burden for tap/target developers to keep their README's up-to-date.

aaronsteers commented 1 year ago

Logged related:

With sample CI output for tap-github, including the streams list:

visch commented 1 year ago

https://github.com/AutoIDM/tap-clickup#clickup-table-schemas has a decent example of Documentation that's nice to have for each Stream. Note that I was following some kind of template from the singer community

stale[bot] commented 1 year ago

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

edgarrmondragon commented 1 year ago

Still relevant

stale[bot] commented 3 months ago

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.