trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

Handle edge cases when converting double from JSON to Trino types #24030

Open Pluies opened 2 weeks ago

Pluies commented 2 weeks ago

Description

Currently, we are unable to optimise tables when they contain infinity/NaN values in fields of type "DOUBLE" because the conversion fails when we try to cast a string containing "Infinity", "-Infinity" or "NaN" into a double.

Instead, we try to check if the JSON value contains a string, and convert it appropriately.

Additional context and related issues

Fixes #24029

Release notes

( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. (x) Release notes are required, with the following suggested text:

## Delta Lake
* Fix silent failure to write checkpoint files for tables containing double values Infinity or NaN. ({issue}`24029`)
cla-bot[bot] commented 2 weeks ago

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Pluies commented 2 weeks ago

Tagging @ebyhr as the original author and @findepi as the original commentor of this prescient review comment 😄

Pluies commented 2 weeks ago

Note that this is mostly a first draft for inputs & review, we'll need to add tests, but I wanted to first check that you agree with the approach and check if you had a better way to do it.

ebyhr commented 2 weeks ago

@Pluies This approach looks good to me. Could you please start adding tests?