influxdata / flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
https://influxdata.com
MIT License
760 stars 152 forks source link

Full featured iox package (flightsql) #5417

Closed lmangani closed 10 months ago

lmangani commented 1 year ago

I'm playing with iox and Flux, but the experimental iox package seems to be using the internal iox client which does not seem to allow connectivity to arbitrary hosts and the method used to determine the client config is a little obscure.

Is there a host method we can use by settings Flux parameters, plans to extend it for general usage - or shall we rather contribute a stand-alone flight sql client?

I already have a flightsql contrib in the works and the arrow library v12 offers a ready driver which works great with iox, but flux still uses v7. Would a stand-alone contrib using the latest github.com/apache/arrow/go/v12/arrow/flight/flightsql/driver be accepted? @wolffcm

Thanks in advance,

Lorenzo

lmangani commented 1 year ago

We're beginning to form a plan for a stand-alone FlightSQL client as a community contrib. Before diving in deeper I'd just like to check and confirm whether we could simply extend the existing experimental iox module to optionally support a custom endpoint? This would save a huge rewrite and would avoid lifting the size of flux itself with more dependencies. Any spare suggestions to get us on the right path from @sanderson perhaps? Thanks!

sanderson commented 1 year ago

@lmangani

we could simply extend the existing experimental iox module to optionally support a custom endpoint?

If this is feasible, I say go for it! I don't know much about the actual implementation, but If this solves the problem in a simpler way, I'm all for that. Everything would just have to be backwards compatible. When it comes to reviewing the actual implementation, that will like be @wolffcm or @gavincabbage.

lmangani commented 1 year ago

Thanks @sanderson for the input, I greatly appreciate it! The most backwards compatible way, would be indeed extending the existing iox contrib to become a little more generic and unlock further development without wasting the great work already put into it. All we need is a bit of a handover to make a transition to community sourced contributions a little easier.

@wolffcm @gavincabbage if you have any spare bandwidth to guide us we'll make your time worth it :)

lmangani commented 1 year ago

Kind bump @wolffcm @gavincabbage any chance you could help unlock or suggest a viable approach to open the iox server settings so we can carry the existing extension forward? Thanks a million!

wolffcm commented 1 year ago

Hi @lmangani,

A good first step might be for you to look at https://github.com/influxdata/flux/blob/master/dependencies/iox/client.go

And specifically, it looks like the Provider creates retrieves a client that just errors by default: https://github.com/influxdata/flux/blob/d8995bbfb3ad23bbd166a4daaea7a5d9e967dce6/dependencies/iox/client.go#L37-L43

The typical workflow for an app that embeds Flux (such as the app that runs Flux in the InfluxDB 2.0 Cloud, or some other app that you've written) would inject an implementation of Client into the context that can talk to the IOx instance that you care about. Injection is done by calling Inject.

The client that you inject could come from here: https://github.com/influxdata/influxdb-iox-client-go There are some unit tests in that repo you can play around with to work it out. Note that this is an Apache Flight client.

Probably moving forward you may get more utility from being able to use a FlightSQL client from a Flux script. A good place for that might be in stdlib/experimental/flightsql or something. FlightSQL is the latest and greatest communication protocol used by IOx, so you can expect it to support more features in the future. It is a standard so it would open up the possibility of being used with other data stores.

Sorry for the delay in resonse---hope that this gets you pointed in the right direction.

lmangani commented 1 year ago

Thanks @wolffcm for the input, I appreciate it, but I'm not sure If I'm drawing the wrong conclusions from it.

I already have a few stand-alone FlightSQL clients working in go and other languages with a local IOx build so that part seems quite simple, but I still don't understand the current GetProvider implementation enough to determine if its extensible or a dead end.

I guess most users (me included) would need and want an IOx/Flight client working as independently any other Flux SQL source, and being able to specify the server/API dynamically on each request seems a hard requirement. Pairing a Flux instance with a specific Client instance config seems a limiting approach only viable for something like InfluxCloud but not much for end-user scripts wanting to connect to arbitrary API endpoints to pull data.... Am I misunderstanding your indications here? At this point in time, writing a stand alone contrib flightsql client seems a bit less discouraging but I might be wrong.

Once again thanks for taking the time!

wolffcm commented 1 year ago

I guess most users (me included) would need and want an IOx/Flight client working as independently any other Flux SQL source, and being able to specify the server/API dynamically on each request seems a hard requirement.

Yes, that makes a lot of sense. iox.sql was created for use in the cloud where the IOx server would be set via configuration of the app for all Flux queries.

For the InfluxCommunity fork you are totally free to add a host and/or port parameter to iox.sql, by updating the code here: https://github.com/influxdata/flux/blob/d8995bbfb3ad23bbd166a4daaea7a5d9e967dce6/stdlib/experimental/iox/source.go#L34-L42

And also here: https://github.com/influxdata/flux/blob/d8995bbfb3ad23bbd166a4daaea7a5d9e967dce6/stdlib/experimental/iox/iox.flux#L56

And there may be other places.

You could definitely also create a more generic standalone way to use a FlightSQL client from Flux as well in contrib. In a similar way.

lmangani commented 1 year ago

Alright so the current implementation seems to configure an IOx client using the platform's influxdb.GetProvider() data.

type Config struct {
    Org    [NameOrID](https://pkg.go.dev/gitlab.qobserve.com/teamq/flux/dependencies/influxdb#NameOrID)
    Bucket [NameOrID](https://pkg.go.dev/gitlab.qobserve.com/teamq/flux/dependencies/influxdb#NameOrID)
    Host   [string](https://pkg.go.dev/builtin#string)
    Token  [string](https://pkg.go.dev/builtin#string)
}

I will try to understand the full flow and hopefully add optional user-provided override parameters in a new PR and continue this on the community fork!

Thanks @wolffcm for the assist!

lmangani commented 1 year ago

Sadly have to reopen this for help @wolffcm

Adding parameters is super easy, so the function can now have an additional host and token parameters.

The SqlProcedureSpec config has been extended with a Host/Token parameters as per Struct:

        return &SqlProcedureSpec{
                Config: iox.Config{
                        Bucket: influxdb.NameOrID{
                                Name: bucket,
                        },
                        Host: host,
                        Token: token,
                },
                Query: query,
        }, nil

The schema seems fine and the config passes.

Next we have to either replace or bypass the getProvider action and this is where it gets tricky and black boxy and when we pass the extended config to create the source, something goes wrong (without clear errors).

I've even tried to create a custom provider for testing, but nothing seems to get me any closer to a configured client.

func (s *SqlProcedureSpec) CreateSource(id execute.DatasetID, a execute.Administration) (execute.Source, error) {
        ctx := a.Context()
        custom := influxdb.HttpProvider{
                DefaultConfig: s.Config,
        }
        ctx = influxdb.Dependency{
                Provider: custom,
        }.Inject(ctx)

        provider := iox.GetProvider(ctx)
        fmt.Println("Provider",provider)

        client, err := provider.ClientFor(ctx, s.Config)

        if err != nil {
                fmt.Println("ClientFor Error",err.Error())
                return nil, err
        }

        return &sqlSource{
                d:      execute.NewTransportDataset(id, a.Allocator()),
                client: client,
                query:  s.Query,
                mem:    a.Allocator(),
        }, nil

No matter what I'm still always ending up with a supposedly valid config, an empty provider and no client initialization.

ClientFor Error iox client has not been configured
Error: failed to initialize execute state: iox client has not been configured

Any ideas or suggestions?

wolffcm commented 1 year ago

Hi @lmangani, sorry for the delay in response, I was on vacation last week.

It looks like you are still using the ErrorProvider which is the default one for vanilla Flux: https://github.com/influxdata/flux/blob/d8995bbfb3ad23bbd166a4daaea7a5d9e967dce6/dependencies/iox/client.go#L64-L69

I see you doing this:

       ctx = influxdb.Dependency{
                Provider: custom,
        }.Inject(ctx)

And I would expect something like that to be the solution. I am not sure which influxdb.Dependency is being invoked there, but perhaps it is not the right one.

I think if you were to call this one, you would get the result you want: https://github.com/influxdata/flux/blob/d8995bbfb3ad23bbd166a4daaea7a5d9e967dce6/dependencies/iox/client.go#L26

lmangani commented 1 year ago

@wolffcm hope you had a fantastic holiday! As always, thanks for taking the time and for the pointer. I'll give it a shot next.

github-actions[bot] commented 10 months ago

This issue has had no recent activity and will be closed soon.