radbrt / target-oracle

Singer.io compatible target for Oracle databases
0 stars 14 forks source link

Decimal to Float conversion in jsonschema causes validation error #14

Open walkabout21 opened 1 month ago

walkabout21 commented 1 month ago

loading Decimal type from mssql server to oracle using tap-mssql and target-oracle throws jsonschema validation error

2024-10-08T20:10:20.033688Z [info     ]     raise error                cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-oracle name=target-oracle producer=False run_id=097d3cec-5958-49a2-aa24-50a73adc0779 stdio=stderr string_id=target-oracle
2024-10-08T20:10:20.033753Z [info     ] jsonschema.exceptions.ValidationError: 4.6 is not a multiple of 0.1 cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-oracle name=target-oracle producer=False run_id=097d3cec-5958-49a2-aa24-50a73adc0779 stdio=stderr string_id=target-oracle
2024-10-08T20:10:20.033817Z [info     ]                                cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-oracle name=target-oracle producer=False run_id=097d3cec-5958-49a2-aa24-50a73adc0779 stdio=stderr string_id=target-oracle
2024-10-08T20:10:20.033881Z [info     ] Failed validating 'multipleOf' in schema['properties']['course_mean']: cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-oracle name=target-oracle producer=False run_id=097d3cec-5958-49a2-aa24-50a73ad

Most likely the change needs to be made here:

        if self._jsonschema_type_check(jsonschema_type, ("number",)):
            if self.config.get("prefer_float_over_numeric", False):
                return cast(sqlalchemy.types.TypeEngine, sqlalchemy.types.FLOAT())
            return cast(sqlalchemy.types.TypeEngine, sqlalchemy.types.NUMERIC(38, 10))

Similar to open issue here Float-decimal validation issue

walkabout21 commented 1 month ago

Forgot to mention that open issue is for target-csv

radbrt commented 1 month ago

This one was interesting @walkabout21. I tried to get some decimal types from tap-mssql, but I don't think my decimals are the same as your decimals. The test-file (test singer records) I intercepted is here: https://github.com/radbrt/target-oracle/blob/decimaltypes/target_oracle/tests/data_files/decimal_types.singer (fyi, these are inserted as varchar)

As you can see, there is no MultipleOf to be seen anywhere in there. Are you using the default wintersrd variant of tap-mssql? I can try to generate the records again with a different variant, or if you are able to find some of the singer records at issue that would be even better.

walkabout21 commented 1 month ago

I'm using the wintersrd variant.
The source column is defined this way by jsonschema

"course_mean":{"inclusion":"available","multipleOf":0.1,"type":["null","number"]}

The value is defined as

"course_mean":4.6

The column in sql server is defined as

course_mean decimal(2,1) NULL,

I'm not sure if this is an issue with the target handling the input or the tap not defining the sqlalchemy datatype properly.

walkabout21 commented 1 month ago

I did try testing this with the TARGET_ORACLE_PREFER_FLOAT_OVER_NUMERIC set to True, but it was the same result.

radbrt commented 1 month ago

Thanks for the details @walkabout21, I was able to reproduce the error by replacing the schema definition for yours: {"inclusion":"available","multipleOf":0.1,"type":["null","number"]} in the schema record. Realistically I might have a fix over the weekend.

walkabout21 commented 1 month ago

I did find a work around by setting the mssql-tap.

use_singer_decimal value: true

That setting exports number columns as strings.

Without setting this, even after I set the stream_maps config to null for the decimal columns, I was still getting type errors saying the numbers had no n length attribute.

radbrt commented 1 month ago

@walkabout21 great to hear, and that explains why I couldn't reproduce it, for some reason I had set that one to true when I was testing.

The Meltano Slack is full of discussions of the multipleOf issue, it seems this is not the only target that has issues with it.

walkabout21 commented 1 month ago

@radbrt should this issue stay open? There is a work around, but it's not ideal. Since this issue shows up in a number of targets is this really a meltano/singer issue?

radbrt commented 1 month ago

@walkabout21 I do think it it should ideally be handled elsewhere, but I do want to look into it more and hopefully find a decent workaround myself, so ot can just stay open.