influxdata / flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
https://influxdata.com
MIT License
767 stars 153 forks source link

side effects are not always contained in first expression of chain #3614

Closed adrian-thurston closed 11 months ago

adrian-thurston commented 3 years ago

It's possible for side effects that should be contained in the first expression of chain, which is the expression that is executed in a sub-execution environment, to emerge in the containing execution and also be planned, executed and drained there.

To reproduce, start with an empty dest bucket and run the following in OSS

import "csv"
import "experimental"
import "system"

csvdata ="
#group,false,false,true,true,false,false,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string
#default,_result,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement
,,0,2018-04-06T10:49:41.565Z,2020-04-06T11:49:41.564Z,2020-02-22T15:01:00Z,50,bottom_degrees,h2o_temperature
"

A = csv.from( csv: csvdata )
        |> map( fn: (r) => ({ r with _time: system.time() }) )
        |> to( bucket: "dest" )

B = csv.from( csv: csvdata )

experimental.chain( first: A, second: B )

The output will include the following, which should not be present.

Result: to5
Table: keys: [_field, _measurement, _start, _stop]
         _field:string     _measurement:string                     _start:time                      _stop:time                      _time:time                  _value:float  
----------------------  ----------------------  ------------------------------  ------------------------------  ------------------------------  ----------------------------  
        bottom_degrees         h2o_temperature  2018-04-06T10:49:41.565000000Z  2020-04-06T11:49:41.564000000Z  2021-04-07T21:36:28.975398046Z                            50  

And the dest bucket will contain two points, when it should contain only one.

Result: _result
Table: keys: [_start, _stop, _field, _measurement]
                   _start:time                      _stop:time           _field:string     _measurement:string                      _time:time                  _value:float  
------------------------------  ------------------------------  ----------------------  ----------------------  ------------------------------  ----------------------------  
1970-01-01T00:00:00.000000000Z  2021-04-07T21:36:30.210508450Z          bottom_degrees         h2o_temperature  2021-04-07T21:32:27.284064796Z                            50  
1970-01-01T00:00:00.000000000Z  2021-04-07T21:36:30.210508450Z          bottom_degrees         h2o_temperature  2021-04-07T21:32:27.321756596Z                            50  

An attempt to reproduce this in pure flux, using the sql.to function, does not work. The writing does not occur twice.

[thurston@peyto] table-find-side-effects: sqlite3 /tmp/to.db
sqlite> create table t ( _start datetime, _stop datetime, _time datetime, _measurement string, _field string, _value bigint );

Then run the following, only one row will show up, as expected.

import "csv"
import "experimental"
import "system"
import "sql"

csvdata ="
#group,false,false,true,true,false,false,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string
#default,_result,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement
,,0,2018-04-06T10:49:41.565Z,2020-04-06T11:49:41.564Z,2020-02-22T15:01:00Z,50,bottom_degrees,h2o_temperature
" 

A = csv.from( csv: csvdata )
        |> map( fn: (r) => ({ r with _time: system.time() }) )
        |> sql.to( 
                driverName: "sqlite3",
                dataSourceName: "file:/tmp/to.db?cache=shared&mode=rw",
                table: "t" )

B = csv.from( csv: csvdata )

experimental.chain( first: A, second: B )
github-actions[bot] commented 11 months ago

This issue has had no recent activity and will be closed soon.