syndesisio / syndesis

This project is archived. A flexible, customizable, open source platform that provides core integration capabilities as a service.
https://syndesis.io/
Apache License 2.0
597 stars 203 forks source link

SQL Connector - Incorrect column type inference in Data Mapper #6576

Closed evanshortiss closed 4 years ago

evanshortiss commented 5 years ago

This is a...


[x] Bug report

Description

I created a Kafka connection that accepts JSON per schema:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "object",
    "properties": {
        "meterId": {
            "type": "number"
        },
        "timestamp": {
            "type": "number"
        },
        "status": {
            "type": "string"
        }
    }
}

I then write it to Postgres with the following table structure:

CREATE TABLE meter_status_evals01 ( id serial NOT NULL PRIMARY KEY, meter_id serial NOT NULL references meter_info(id), timestamp TIMESTAMP NOT NULL, status_text text NOT NULL );

And the insert SQL:

INSERT INTO meter_status_evals01 (meter_id, timestamp, status_text) VALUES (:#meter_id, to_timestamp(:#timestamp), :#status_text);

This of course requires a Data Mapper. The bug occurs here. For some reason the Data Mapper thinks status_text is of type INTEGER and timestamp is of type STRING.

I can fix the issue by changing the insert SQL to:

INSERT INTO meter_status_evals01 (meter_id, status_text, timestamp)VALUES (:#meter_id, :#status_text, to_timestamp(:#timestamp));

The type inferences seems linked to the alphabetical ordering, but the link isn't clear to me.

Screenshot 2019-09-12 at 11 36 50 AM Screenshot 2019-09-12 at 11 47 18 AM
KurtStam commented 5 years ago

@evanshortiss thx - can add the stack trace to? Prob in the meta pod..

evanshortiss commented 5 years ago

Ah I don't think I have the full trace. Can try get one in a while.

{"exchange":"i-LoWiXQz3NpEpDN9X_Sqz","status":"begin"}
{"exchange":"i-LoWiXQz3NpEpDN9X_Sqz","step":"-LoWd9q_1rXBXdOiBO9v","id":"i-LoWiXQz3NpEpDN9X_Srz","duration":42897}
2019-09-11 19:08:30.847  INFO 1 --- [onsumer[meters]] o.a.c.component.atlasmap.AtlasEndpoint   : Conversion from 'NUMBER' to 'INTEGER' is supported: docId='mapping.712942', path='null'
2019-09-11 19:08:30.847  WARN 1 --- [onsumer[meters]] o.a.c.component.atlasmap.AtlasEndpoint   : Conversion from 'STRING' to 'INTEGER' can cause numeric format exceptions: docId='mapping.329044', path='null'
2019-09-11 19:08:30.847  WARN 1 --- [onsumer[meters]] o.a.c.component.atlasmap.AtlasEndpoint   : Conversion from 'STRING' to 'INTEGER' can cause out of range exceptions: docId='mapping.329044', path='null'
2019-09-11 19:08:30.847  WARN 1 --- [onsumer[meters]] o.a.c.component.atlasmap.AtlasEndpoint   : Conversion from 'STRING' to 'INTEGER' can cause fractional part to be lost: docId='mapping.329044', path='null'
2019-09-11 19:08:30.847  INFO 1 --- [onsumer[meters]] o.a.c.component.atlasmap.AtlasEndpoint   : Conversion from 'NUMBER' to 'STRING' is supported: docId='mapping.478543', path='null'
2019-09-11 19:08:30.848 ERROR 1 --- [onsumer[meters]] o.a.camel.processor.DefaultErrorHandler  : Failed delivery for (MessageId: i-LoWiXQz3NpEpDN9X_Ssz on ExchangeId: i-LoWiXQz3NpEpDN9X_Sqz). Exhausted after delivery attempt: 1 caught: io.atlasmap.api.AtlasException: Errors: [Unable to auto-convert for sT=STRING tT=INTEGER tF=/status_text msg=Invoking type convertor failed: docId='-LoWdBSq1rXBXdOiBO9v', path='/status_text'], 
KurtStam commented 5 years ago

Hmm @evanshortiss So the database type of the timestamp column is TIMESTAMP and and the timestamp function takes a double precision int.

So I think are expecting to map to an Integer in the data mapper (String will work too, as long as it is an integer number). This is going to be hard to support as we're venturing into database specific implementation here using functions. This will get even more interesting if the two argument version is used.

I'm afraid in cases like these we would need to provide an option to set a custom dataShape. I will see what happens can 'gracefully degrade' and set default the type to String.

KurtStam commented 5 years ago

This fails pretty quickly as "(" is one of the splitter characters to break down the SQL statement and so:

VALUES (:#meter_id, to_timestamp(:#timestamp), :#status_text)

is broken into

VALUES, :#meter_id, to_timestamp, :#timestamp, :#status_text

so yeah.. if this works it's completely accidental.

The only way this works the right way is if you remove to function call like:

VALUES (:#meter_id, :#timestamp, :#status_text)

You then will have to make sure to insert a support string format for the time.

@igarashitm Does AtlasMap support a INT -> TIMESTAMP transformation at the moment?

info on the function to_timestamp(double precision) timestamp with time zone Convert Unix epoch (seconds since 1970-01-01 00:00:00+00) to timestamp to_timestamp(1284352323) | 2010-09-13 04:32:03+00

and if not, would we want to support that?

KurtStam commented 5 years ago

@evanshortiss a workaround would be to create a Stored Procedure.

igarashitm commented 5 years ago

AtlasMap doesn't have TIMESTAMP type for JSON schema. It should be translated to string or number.

igarashitm commented 5 years ago

If you just want to format from epoch millisecs into human readable STRING, we can add a formatDateTimeFromEpochMillis() transformation which applies to NUMBER field. Currently we don't have it OOTB. https://github.com/atlasmap/atlasmap/issues/1217

igarashitm commented 5 years ago

Also in a bit longer term, we will need to consider supporting date/time format in JSON schema. https://github.com/atlasmap/atlasmap/issues/1216 https://json-schema.org/draft/2019-09/json-schema-validation.html#rfc.section.7.3.1 It then requires field type to be string instead of number though.

KurtStam commented 5 years ago

@evanshortiss wdyt? Personally I like atlasmap/atlasmap#1217

evanshortiss commented 5 years ago

@KurtStam thanks for looking into this. Let me summarise my understanding as someone not familiar with this ecosystem:

  1. The to_timestamp function is part of the PSQL syntax
  2. Syndesis parses the SQL text/input into some format internally to prepare a query
  3. The parser uses ( as a splitter
  4. Using abitrary PSQL (or MySQL etc. flavors) functions is not supported by the parser

The solution being suggested is a built-in function formatDateTimeFromEpochMillis that can be parsed and executed. This would allow Syndesis users to use a TIMESTAMP column, but with the caveat they must send data to the Connector as an integer with millis since epoch?

igarashitm commented 5 years ago

Correct, in this case you have to send that millisecs decimal to the Data Mapper step. That is atlasmap/atlasmap#1217. atlasmap/atlasmap#1216 is to treat timestamp column directly in the SQL connector behind the scene and deliver as, say DATE_TIME_TZ type of field instead. Sure allowing SQL functions in SQL connector would be yet another option.

evanshortiss commented 5 years ago

@igarashitm thanks for the clarification. This seems like reasonable workaround if specific SQL functions can't be supported 👍