timescale / outflux

Export data from InfluxDB to TimescaleDB
Apache License 2.0
90 stars 22 forks source link

Data Structure in influxdb and timescaledb are different #71

Open komal-lunkad opened 5 years ago

komal-lunkad commented 5 years ago

In my influx measurement, the timestamp tag data type is timestamp and that in timescale it is timestamp with timezone. Also , there us one more field which is of type string seperated by commas and I want to convert it into an array in timescale.

Please help me in which file should I make the required chanages and how to run the code for the same.

atanasovskib commented 5 years ago

Hello @komal-lunkad. Yes Outflux exports the time column from InfluxDB as timestamptz type. Currently Outflux doesn't allow you to influence the schema much during migration or define your custom transformations. But you can do the following.

  1. Run the migrate command and transfer all your data. (Let's say it creates a hyperatable a with a time column that's timestamptz, and a TEXT column b with values like a,b,c
  2. Modify the time column in TimescaleDB ALTER TABLE a ALTER COLUMN time TYPE timestamp;
  3. Create a new column of type array, and populate it with the string_to_array function built in PostgreSQL: ALTER TABLE a ADD COLUMN b_arr text[]; UPDATE a SET b_arr = string_to_array(b, ',');
komal-lunkad commented 5 years ago

Thank you for the answer @blagojts. Currently, we already have the table defined on production with the fields holding the above specified data types. So could you please help with how can we make through the code like if we can change any file or so in the code?

atanasovskib commented 5 years ago

You should take a look at the Transformer interface and the jsoncombiner implementation https://github.com/timescale/outflux/tree/develop/internal/transformation

You can implement your own transformer that will run between extraction from InfluxDB and ingestion in TimescaleDB. The Prepare method returns the desired data set definition after the data goes through the transformer. The Start method is what transforms the actual rows.

Then you just need to invoke it when the migration pipes are created https://github.com/timescale/outflux/blob/005c6b869e09966bd863968496f38b3c1c31fb08/internal/cli/pipe_service_create_transformers.go#L14 depending on some config flags

You might add support for array data types, and that's a bit more tricky (not complex, just have to modify multiple files). Start with https://github.com/timescale/outflux/blob/develop/internal/idrf/data_type.go

IDRF stands for intermediate data representation format and its what Outflux uses to keep track of the data types and which columns are present in a data set. After you add the array data type in the enum you might need to update all functions referencing it

polmonso commented 4 years ago

@blagojts We have tables in our influxdb with the default timestamp and want to migrate to timescaldb to a table with timestamptz. You said that outflux exports the time column as timestamptz, but that's not what we got when we migrated, we got timestamp.

We have other tables in timescaledb already in timestamptz so we are trying to find out how to best integrate the two to do operations mixing them.

In the code of outflux you do distinguish between timestamp and timestamptz, how can we tell the migration to use timestamptz?

AlbertoRasillo commented 4 years ago

Following up @polmonso comment, we got this error message:

existing table in target db is not compatible with required. Required column time of type IDRFTimestamp is not compatible with existing type Timestamptz
polmonso commented 4 years ago

bump?

ctron commented 3 years ago

I ran into the same issue. I was able to resolve it by letting outflux create the schema, and then alter the time column type afterwards using:

ALTER TABLE temperatures ALTER COLUMN time TYPE TIMESTAMP WITH TIME ZONE USING time AT TIME ZONE 'UTC';