trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.93k stars 2.87k forks source link

parse_datetime behavior depends on client locale #21786

Open jpambrun opened 2 months ago

jpambrun commented 2 months ago

We are an international team and we were almost going insane not understanding why some query fails intermitently. It turns out parse_datetime behave differently base on the connected client locale. In our case, this function was invoked from a view which made it hard to pin down. This is also invoked on data from a database table, not from the client, making it even more confusing. I am running 426 on AWS EMR.

const headers = { 
    "X-Trino-User": "vida", 
    "X-Trino-Language": "en-us", // "december" works, "dec" does not
    // "X-Trino-Language": "", // "dec" works, "december" does not
    //"X-Trino-Language": "pt-br", // "dec" works, "december" does not
};

const postResponse = await fetch(`http://trino-url/v1/statement`, {
    method: 'POST',
    headers,
    body: `SELECT parse_datetime('December 25 2020 12:34:56 GMT+0000','MMM dd yyyy HH:mm:ss ''GMT''Z')`
    // body: `SELECT parse_datetime('Dec 25 2020 12:34:56 GMT+0000','MMM dd yyyy HH:mm:ss ''GMT''Z')`
});

if (!postResponse.ok) throw new Error(`HTTP error! status: ${postResponse.status}`);

const postBody = await postResponse.json()
let nextUri = postBody.nextUri;

for (let i = 0; i < 100; i++) {
    const getResponse = await fetch(nextUri, { method: 'GET', headers });
    if (!getResponse.ok) throw new Error(`HTTP error! status: ${getResponse.status}`);
    const getBody = await getResponse.json()
    console.log(getBody.stats.state)

    if (getBody.stats.state === "FINISHED") {
        break;
    } else if (getBody.stats.state === "FAILED") {
        console.log(getBody.error.message)
        break
    }else if (getBody.stats.state === "RUNNING") {
        console.log(getBody.data)
    }
    nextUri = getBody?.nextUri
    await new Promise(resolve => setTimeout(resolve, i * 50))
}
// deno run --allow-net test.mjs
hashhar commented 2 months ago

cc: @martint

martint commented 2 months ago

My thought is that we should hard-code the locale in the implementation of those functions and provide alternative versions that take a locale as an argument.

electrum commented 2 months ago

Views should store the locale. The behavior of a view should not change based on the current session, unless it explicitly uses functions such as current_user.

We shouldn’t change the existing behavior of the functions, as that may break users. Locale is a supported part of the session for functions to use.

martint commented 2 months ago

The problems I see in the current implementation are:

  1. The locale is not something the user can easily change. There's no mechanism via session properties or language to customize it for the session or for a given query.
  2. There's no way to select locale on a per-callsite basis. E.g., to select the locale to use dynamically on a row by row case.
  3. The locale is generally a characteristic of the data, not a user preference (such as which timezone to use render values displayed to the user) so functions that treat locale transparently make it hard (or impossible) to write portable queries.

(2) can be solved by adding a variant of the functions that takes a locale as an argument. For (1) and (3) I think we should change the behavior of the functions, maybe with a deprecation period and a flag to restore the legacy behavior in the meantime. I expect the vast majority of use cases would benefit more from portability and predictability of queries than varying the behavior based on how the client is configured. That seems like a very niche application.