influxdata / flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
https://influxdata.com
MIT License
767 stars 153 forks source link

Add support for different join types #84

Open jpacik opened 6 years ago

jpacik commented 6 years ago

Currently join implements the inner join operation. Join needs to support the following types of joins as well:

mark-rushakoff commented 6 years ago

Tasks could really use some of the other joins for retrieving task run logs. I think we can work around it in Go, but it would be a lot simpler if we could do it in straight Flux.

lovasoa commented 5 years ago

There is a discrepancy between the documentation and the actual product.

The documentation says:

method

The method used to join. Defaults to "inner". Data type: String Possible Values:

  • inner
  • cross
  • left
  • right
  • full

But when using anything other than inner, we get an error, such as

error calling function "join": left is not a valid join type

I think the documentation should be updated to reflect the fact that only inner is currently supported.

legomind commented 5 years ago

Any chance this issue will get some attention in the near future? A left join would be very handy for my use case.

jpacik commented 5 years ago

@legomind I'll try and get this prioritized for you. We currently have some other things in progress at the moment, but I'll let you know when we start work on this.

nathanielc commented 5 years ago

@legomind Could you share your use case? Seeing an example Flux script of what you are trying to do along with the data schema you are using would be very helpful. Even if the Flux script doesn't work, just seeing how you would expect it to would is great.

legomind commented 5 years ago

@jlapacik Thanks that would be awesome. @nathanielc My use case is pretty simple, but highly specific. I have two energy meters one at the circuit breakers and one attached to a dehumidifier. The circuit breaker energy monitor reports a measurement for most of the breakers as well as the total house use. It also reports the difference between all the monitored individual breakers (all combined) and the total usage, leaving me with the combined energy use of the remaining, unmonitored breakers (called 'other' in this example).

My second energy monitor is attached directly to the dehumidifier and the measurements are recorded to another db.

My goal is to take subtract the dehumidifiers energy use from the 'other' measurement. The following script works fine for me until the dehumidifier in turned off, then of course, there is nothing to match to, and the rest of the graph data is omitted.

I am very new influxdb and any suggestions or improvements will be welcome.

other_total = () => {
    other = from(bucket: "iotawatt/autogen")
        |> range($range)
        |> filter(fn: (r) => r._measurement == "other_total" and r._field == "value")
        |> aggregateWindow(every: $__interval, fn: mean)
        |> keep(columns: ["_value", "_time"])

    d = from(bucket: "home_assistant")
        |> range($range)
        |> filter(fn: (r) => r._measurement == "switch.living_room_dehumidifier" and r._field == "power_consumption")
        |> aggregateWindow(every: $__interval, fn: mean)
        |> keep(columns: ["_value", "_time"])

    return join(tables: {o:other, d:d}, on: ["_time"])
        |> map(fn: (r) => ({
            _time: r._time,
            _value: r._value_o - r._value_d,
                _measurement: "Other",        }))
}

other_total() |> yield(name: "Other")
jpacik commented 5 years ago

@legomind so if the dehumidifier is turned off for a period of time, currently join would not produce any rows for that time range. But you would like it to still produce rows for the the other measurement, even if the dehumidifier is turned off? Am I understanding that correctly?

legomind commented 5 years ago

Correct.

metareason commented 5 years ago

I also got tripped up by this one trying to use left join with a very similar use case to @legomind - in my case I am trying to drop one series to 0 when another 'cutter' series is > 100, and am getting infrequent data from the 'cutter' series and would like to try with a left join.

// join(tables: {tt2:tt2, pt:pt}, method: "left") // error in evaluating AST while starting program: error calling function ""map"": error calling function ""join"": left is not a valid join type
join(tables: {tt2:tt2, cut:cut}, on: ["_time", "_stop", "_start"])
  |> map(fn: (r) => ({
    _time: r._time,
    _value: if r._value_cut >= 100.0 then 0.0 else r._value_tt2
  }))

NOTE: I am also fighting with combinations of |> aggregateWindow(every: 1m, fn: mean, createEmpty: false) vs |> aggregateWindow(every: 1m, fn: mean, createEmpty: true)

and using |> fill(column: "_value", usePrevious: true) to try and fill in the cutter series so I can use the inner join but seems like that is not working as expected yet.

legomind commented 5 years ago

@metareason Here is to hoping this is addressed soon with the addition of left outer joins.

legomind commented 5 years ago

Just wondering if there was any movement on this. I am more than willing to test if needed.

jpacik commented 5 years ago

We will be addressing join's deficiencies at some point in the future. In the meantime we have added a new join function to the internal/promql package with a type signature that more closely reflects what we want for the current join.

Note, promql.join is a specialized inner join function that can also perform left and right semi joins as well. Here's the doc https://github.com/influxdata/flux/blob/master/stdlib/internal/promql/join.md.

jimirocks commented 4 years ago

Any chance for left joins in the near future? This lack of functionality is basically showstopper for some of my usecases.Since when there is really no data in requested range, even fill function can't help

jimirocks commented 4 years ago

I am also not able to use promql.join the API says unknown import path: ""internal/promql""" "

darinfisher commented 4 years ago

I recently had a use case where there are two similar streams, each containing at least a URL as a key. Some of the URLs are in both streams and other are in only one. I tried to join() these and the resulting stream contained only the URLs that appeared in both streams, an inner join. I needed the list to contain all of the URLs from both streams, an outer join.

lavensj commented 3 years ago

Hi there, I've recently elaborated further on this topic concerning Joining in the Flux query language. In the article below you can find a tutorial on how to perform a (full) outer join in flux applied on a realistic use case accompanied with some side notes about the previous lack of a decent join implementation: https://dzone.com/articles/how-to-pivot-and-join-time-series-data-in-flux

DLCrosby commented 3 years ago

This is for Flux (InfluxDB V2)

In the UI, documentation on join() function includes:

method:String The method used to join. Possible values are: inner, cross, left, right, or full. Defaults to "inner".

However when I set method to left, I receive an error that left is not a valid join type

"error calling function "join" @20:1-23:2: left is not a valid join type"

Seems this has been an issue for a couple years now. Will this ever be resolved? I really need to be able to perform a left join - part of my POC of using InfluxDB.

lavensj commented 3 years ago

Hi @DLCrosby, Checkout my comment above referring to the article on Dzone. At the end of the article, you can find a solution on how to perform an outer join (which will give you a similar result as you are looking for when using a left join).

DLCrosby commented 3 years ago

@lavensj I will look at that again, however the issue remains - Documentation shows other joins besides "inner", however only "inner" works. When will the product be fixed to support additional join types? Given this was first brought up years ago, it is supremely disappointing that the only answer is a complex workaround.

a-vogel-tappert commented 2 years ago

Hi, any plans to implement outer joins short termed? Using unions, sorts, limits, aso are very ressource consuming operations to get a result. Thanks in advance.

ivalkenburg commented 2 years ago

This would solve so many problems im having

MihaelBercic commented 2 years ago

4 years after and still not implemented?

Loeni01 commented 2 years ago

Are there any plans to implement other types of joins? When will it be done? I'm looking very much forward to this feature.

sanderson commented 2 years ago

@Loeni01 Work is happening on adding new join types right now. I don’t have an exact timeframe for when they’ll be available, but it will be soon. They will be a part of a new join package. Here’s the Initial spec for the package: https://github.com/influxdata/flux/issues/4629

nathanielc commented 2 years ago

Thanks @sanderson for the update, you can follow our progress on this issue here where we have broken out the various steps https://github.com/influxdata/flux/issues/3459

Elleanora commented 2 years ago

Until the full join functionality is implemented, a good workaround is to use the combination of the union() and pivot() functions.

Add the following code to final output stream for any table to be used in the downstream union operation:

table_1 = from(bucket: "some_bucket_1") |> range(start: -1d) |> map(fn: (r) => ({ join_column_name: r.join_column_name, _field: "choose_new_column_name_1", _value: column_value_1 })) |> group()

table_2 = from(bucket: "some_bucket_2") |> range(start: -1d) |> map(fn: (r) => ({ join_column_name: r.join_column_name, _field: "choose_new_column_name_2", _value: column_value_2 })) |> group()

union(tables: [table_1, table_2]) |> group() |> pivot(rowKey:["join_column_name"], columnKey: ["_field"], valueColumn: "_value") |> group()