timescale / outflux

Export data from InfluxDB to TimescaleDB
Apache License 2.0
90 stars 22 forks source link

Migrating same name fields from a measurement #59

Closed sgloutnikov closed 5 years ago

sgloutnikov commented 5 years ago

I just ran into an edge case, where InfluxDB has different data types for the same field name within different shards. This is causing a problem with outflux and the migration for the measurement fails.

More on why and how InfluxDB does this can be found here.

2019/07/17 23:47:34 Discovering influx schema for measurement: M1
2019/07/17 23:47:34 pipe_M1: pipe_M1: could not prepare extractor
pipe_M1_ext: could not fetch data set definition for measure: M1
duplicate column names found: signal

This is happening because the signal field type is a float and an integer. InfluxDB:

> show field keys from M1;
name: M1
fieldKey         fieldType
--------         ---------
...
signal           float
signal           integer
...

I'm not sure what the best way to handle this could be. Some ideas that come to mind:

  1. Ignore fields that exhibit this behavior and warn the user, but migrate the rest of the measurement. Currently the entire measruement migration fails.
  2. Keep (merge) the duplicate field names as a single column in TimeScaleDB, with the highest precision data type and 'promote' the rest to it during the migration (per above mentioned influxdb explanation). I.e. select signal::float from M1.
dianasaur323 commented 5 years ago

Hi Stephan - I believe this is actually one of the cases we don't support. @blagojts correct me if I'm wrong? Also, perhaps we should document the known limitations here + workarounds if they exist. cc @bboule

atanasovskib commented 5 years ago

Yes, this is one of the cases we have trouble with. But I like the suggestion to 'promote' the field during migration. We'll discuss with the team and let you know our progress.

Thank you.

Will update the documentation for now that this is a limitation.

atanasovskib commented 5 years ago

We merged a fix for this. If you want to enable integer to float 'promotion' you need to explicitly specify the multishard-int-float-cast flag. This way if a field is discovered with both an integer and float type, it will promote it. But for any other types it will return an error. The explicit flag is required because integers in InfluxDB are int64, and floats are float64, and not all values of a int64 are supported by a float64