Open aaronsteers opened 1 year ago
We'd be eager to chat about how we could help with this. The wider Singer community has put a lot of work into extractors and if we could make it easier to use them for LLM applications that'd be a huge win for everyone!
A generic interface into hub.meltano.com would be great. In that paradigm, the source connectors are called "extractors" or "taps".
There are a few different ways we could create generic connection interfaces, which I can highlight below...
Generally though, each connector would need:
{variant}/{name}
string combo, and/or pip_url of the connector.An example:
tap-asana - Meltano Hub
Connector info:
tap-asana
, variant:singer
tap-asana
Sample config:
Processing Singer output
Singer outputs data as a series of json lines, generally one record which should be easy for the libraries to parse generically.
List of connectors:
https://hub.meltano.com/extractors
This isn't a full list, since many are being created that aren't already on the Hub, but it gives a good idea of the existing depth and breadth of the ecosystem.
How to list on LLama-Hub
To not spam the index, we could just list as a single item on the LlamaHub: either as "MeltanoHub Singer Taps", or "Singer Extractors" generically, or similar.
Thinking about the "right" abstraction layer
I think this could be really powerful, since it could plug in Llamaindex, Langchain, and other GPT-like applications into a broad ecosystem of already existing connectors.
Since the vast majority of Singer connectors are already pip installable, this should fit well with existing paradigms that Llamaindex is using.
I may have some cycles to contribute to this integration but I first wanted to log this issue here to assess interest level, and discuss if there are any potential pitfalls or "gotchas" that others might see.