singer-io / singer-python

Writes the Singer format from Python
https://singer.io
Apache License 2.0
538 stars 128 forks source link

Feature Request: Add support for --stream_name argument #122

Open aaronsteers opened 4 years ago

aaronsteers commented 4 years ago

Proposed Feature Description:

As a user and developer of the Singer platform, I would LOVE to have access to a --stream-name argument in the standard/global tap CLI. When specified, a given tap would only extract data for the targeted stream. Essentially, this logic would intersect and further refine what the 'selected' attribute currently designates within the json file - but without having to edit JSON.

(For reference, my company's JSON catalog for Salesforce (tap-salesforce) is currently >300K lines of code.)

Cost of not having the feature:

The cost of not having this feature is that for large taps, there's no way to run one stream at a time without modifying very large and fragile json files. There's likewise no way to run multiple streams in parallel (which can be done if the stream name is passed as an argument), and there's no good way to retry/rerun just a single stream.

Similarly, during initial development and testing, if the 5th stream out of 9 fails (for instance), there's no way to start by running just the 5th stream. Or if, as a developer, I'm changing just the 9th stream, I have to rerun all streams just to test the final one.

Current Workaround:

In order to get the desired behavior today, we have created another program to wrap around the tap and target which takes as input: (1) a path to catalog_full.json and (2) a --stream_name argument specifying the name of the requested stream. With those inputs, the wrapper parses the full catalog and creates a temporary catalog file {{stream-name}}-catalog-tmp.json. The tap can then be executed for only the specified stream by passing the new stream-specific catalog file instead of the full catalog.

Additional Info:

I am willing and able to contribute code to this effort if the feature is accepted. ⚡️ Thanks!

davicorreiajr commented 4 years ago

So, I think I'm having the same issue.

I'm still playing around with taps (I'm using tap-google-sheets) to understand how it works, and I need to manually change the catalog.json file, adding the selected the param to the stream I want to get data from; which is terrible thinking in terms of using Singer as a plug-and-play lib to extract data.

Any news about that?