meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
94 stars 67 forks source link

[Feature]: Automatically register stream classes as they are declared #953

Open edgarrmondragon opened 2 years ago

edgarrmondragon commented 2 years ago

Feature scope

Taps (catalog, state, stream maps, etc.)

Description

The current pattern we enforce through our documentation and cookiecutter for adding new streams to an HTTP tap is implement a custom MyTap.discover_streams that returns a list of the desired stream instances.

This pattern may provide some consistency but it could be mildly annoying for a developer to remember they have to:

  1. import the new stream class,
  2. add the class to a module-level collection,
  3. implement (once) the discover_streams method:

    from tap_example.streams import NewStream
    
    STREAM_CLASSES = [NewStream]
    
    class TapExample(Tap):
     def discover_streams(self):
       return [stream_class(self) for stream_class in STREAM_CLASSES]

A better approach might get inspiration from other OOP frameworks, like Django. There, database models/tables are defined as classes and whenever some abstraction is required for a set of models, it's also a model class, but explicitly declared as abstract, so it doesn't map to a table in the database.

So, a similar thing in the SDK might look like this:

class MyAPIStream(RESTStream, abstract=True):
  ...

class UsersStream(MyAPIStream):
  name = "users"

Under the hood

aaronsteers commented 1 year ago

@edgarrmondragon - I like this idea. Would we also need to be able to tackle Stream classes like SQLStream in which discovery can produce multiple stream instances of the same class?

What about designing discover_catalog_entries() as a class method on the base Stream class, adapted from SQLConnector.discover_catalog_entries()? Normal stream classes could simply return a single entry based on dynamic or static schema definition. And classes like SQLStream could implement this differently, in order to return multiple (or zero) catalog entries. In the case of the SQLStream class, the stream class could in theory call out to the SQLConnector method we have today.

Side note: some stream types will require the config dictionary in order to detect schema dynamically, so if we go this direction, we may need to pass the config to the class method.

What do you think about all this?

stale[bot] commented 1 year ago

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

edgarrmondragon commented 1 year ago

Still relevant

stale[bot] commented 1 month ago

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

edgarrmondragon commented 1 month ago

Still relevant