This is going to be an ongoing and evolving thing: What should an SDK look like? It is difficult to get these discussions going because they should in a way be very abstract, but we live in reality and have often quite specific thoughts and experiences.
So let's be open to anything, both very abstract and specific.
Some initial thoughts:
We can probably create very useful functionality if we split into subclasses for types of connections:
SQL Databases: Connect, run_query, read/write dataframe, get raw connection, analyze query, lineage within queries? Can probably be mostly templated. Actually, with sqlalchemy it might be almost fully templated.
REST APIs: support for different auth methods (purely as a convenience), pass-trough for requests get/post methods that captures the endpoint and possibly payload. Important to always leave the ability to use raw connection, aka retrieve just credentials or something.
File systems? Perhaps make a generic fsspec wrapper that could catch a lot of use cases? rclone? Very little in common other than some kind of host parameter... Not easy to make much generic for.
The "base" SDK methods/functionality that should be available everywhere:
end_run method to send a "completed" message to marquez (if that's what we are going for
send_lineage to connect/authenitcate and post lineage record to metadata server (whatever we end up supporting)
Storing/assembling lineage until send_lineage method is called, for metadata endpoints that don't tackle "events".
This is going to be an ongoing and evolving thing: What should an SDK look like? It is difficult to get these discussions going because they should in a way be very abstract, but we live in reality and have often quite specific thoughts and experiences.
So let's be open to anything, both very abstract and specific.
Some initial thoughts:
We can probably create very useful functionality if we split into subclasses for types of connections:
The "base" SDK methods/functionality that should be available everywhere:
end_run
method to send a "completed" message to marquez (if that's what we are going forsend_lineage
to connect/authenitcate and post lineage record to metadata server (whatever we end up supporting)send_lineage
method is called, for metadata endpoints that don't tackle "events".