influxdata / flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
https://influxdata.com
MIT License
767 stars 153 forks source link

Programmatically feeding data in and out of a flux query #1132

Closed AdamMagaluk closed 5 years ago

AdamMagaluk commented 5 years ago

Is there any examples or places that you can point to for how one might programmatically feed data in and out of a flux compiled query.

I'm toying around with a use case that would allow streaming data to be pre-processed with the flux query language/engine within a go binary the binary itself is already reading and sending data using it's own mechanisms so running flux cli doesn't make sense.

Thanks.

nathanielc commented 5 years ago

Are you asking for where in the Go API to inject data and how to get it out?

If so the basices are you need to write a Source to inject the data into the engine. Maybe take a look at the stream source?

The results will contain any yielded data, so you can use the flux.Query interface to access the output data.

Does that help?

AdamMagaluk commented 5 years ago

Appreciate the pointer. Yes I was probably a bit unclear i'm looking at the Go API directly.

I was able to get something semi working basing it off the Socket source.

Not sure if that is what you're referring to with "stream source"? Is there any plans to support streaming use cases further? I noticed this comment in the socket source https://github.com/influxdata/flux/blob/master/stdlib/socket/from.go#L2-L3 are there any other sources i might look at for that streaming use cases or some form of micro batching.

Thanks for the help.

nathanielc commented 5 years ago

By Stream source I meant the socket source. The socket source is kind of a streaming source.

Flux's streaming design is currently very bare bones. We have a few ideas and bits in place but are not really flexing those code paths much. Streaming is not a current priority but is on the longer term roadmap.

What kind of streaming are you looking for? Flux does handle the micro batching cases currently. This is accomplished using the ColReader interface as each ColReader is a micro batch for the entire table.

AdamMagaluk commented 5 years ago

Thanks for the clarification, excited to see where the streaming stuff goes in the future.

I don't have any hard requirements right now i'm on more of an exploratory mission. I have an open source IoT project Zetta which focuses on aggregating data at the edge. I'm exploring adding a delaritive stream processing language like Flux or Flink's SQL that can be used at the edge to process the incoming data for things like downsampling, joins across streams, and some basic triggers.

I'll give the ColReader a try this week. Appreciate the help.

nathanielc commented 5 years ago

The project looks great. One of the goals of Flux is that it can be easily embedded and it need not be tied to InfluxDB. As a result your use case aligns with our goals. Please feel free to reach out as your exploration uncovers more questions.

AdamMagaluk commented 5 years ago

I hacked together something that i got half working. https://github.com/AdamMagaluk/flux/pull/1 If you're willing to take a quick look and say which pieces I got completely wrong or if what i'm thinking is not possible at this time. My goal is to get a really basic PoC working so I can experiment with some of the larger concepts of distributing the query across on our edge nodes.

I know the project is ongoing and has lots of development right now and this is not a priority so please just tell me to go away if this is too much of a distraction.

Thank you.

nathanielc commented 5 years ago

I commented on your PR, closing this issue for now. We can talk specifics on the PR.