malloydata / malloy

Malloy is an experimental language for describing data relationships and transformations.
http://www.malloydata.dev
MIT License
1.99k stars 76 forks source link

Extracted dateparts formatting badly when cast into strings #819

Open anikaks opened 2 years ago

anikaks commented 2 years ago

I am trying to concatenate a couple of extracted dateparts together and it's acting oddly. Testing against flights.

Without casting to string, the concatenated parts are not recognized as a string and NaN-Nan etc is produced:

  query: by_fortnight is {
    group_by: dep_year is dep_time.year
    group_by: fortnight is floor(week(dep_time) / 2)
    group_by: year_and_fortnight is concat(dep_time.year, '-',floor(week(dep_time) / 2))
  }
Screen Shot 2022-09-19 at 11 04 01 AM

When I add a cast to a string, the dates are formatted differently from when called on their own and the year now has 01-01 attached:

  query: by_fortnight is {
    group_by: dep_year is dep_time.year
    group_by: fortnight is floor(week(dep_time) / 2)
    group_by: year_and_fortnight is concat(dep_time.year, '-',floor(week(dep_time) / 2))::string
  }
Screen Shot 2022-09-19 at 11 10 36 AM

using year(dep_time) fixes the year (but both should work, I believe), but I can't get rid of the added .0 on the "fortnight" in the concatenation:

  query: by_fortnight is {
    group_by: dep_year is year(dep_time)
    group_by: fortnight is floor(week(dep_time) / 2)
    group_by: year_and_fortnight is concat(year(dep_time), '-',round(floor(week(dep_time) / 2)),0)::string
  }
Screen Shot 2022-09-19 at 11 12 37 AM

year(dep_time) also probably shouldn't have a comma in it when used alone but that's another issue

whscullin commented 2 years ago

There's a lot going on here. Starting with

concat(dep_time.year, '-',floor(week(dep_time) / 2))

Somewhere our logic is assuming that the result of a concat(TIMESTAMP,STRING,DOUBLE) is a TIMESTAMP, even though it's described by the database as a VARCHAR. Given the result coming back is not formatted like a timestamp, that doesn't work.

With

concat(dep_time.year, '-',floor(week(dep_time) / 2))::string

and

concat(year(dep_time), '-',round(floor(week(dep_time) / 2)),0)::string

the results of round() and floor() are DOUBLE when division is applied, so SQL itself is adding the .0 when applying concat(), to indicate that. The reason we don't see it for fortnight is because of the way Javascript handles numbers, it doesn't consider making that distinction. I couldn't figure out an elegant way to cast to an integer inside of the concat(), but maybe I'm missing something.

year(dep_time) also probably shouldn't have a comma in it when used alone but that's another issue

This is because dep_time.year is a TIMESTAMP, year(dep_time) is a BIGINT (that's a DATE_TRUNC() vs EXTRACT() behavior), so that the latter gets number formatting instead of date formatting.

Only parts of this are renderer related, FWIW - it's a combination of database behaviors and how we're processing results within Malloy. I can probably wrangle the first issue because it seems to be us, not the database, but the other ones would possibly require SQL generation changes.

whscullin commented 2 years ago

Apparently we're just using the type of the first argument to determine function types (concat(1, ...) is considered a number), so a quick work-around would be to start with concat('', ...) That doesn't help with the decimal point, though.

whscullin commented 2 years ago

I don't know the answer on how to force a return type for a function like concat(), maybe @mtoy-googly-moogly or @lloydtabb would have a better idea.

mtoy-googly-moogly commented 2 years ago

Functions are a giant ugly handwave right now and are going to be a disaster once we fix them.

The current ugly handwave is that unknown functions are assumed to return the type of their first argument, and that a dialect can have a function table which can provide the return type of functions where this heuristic fails. Adding concat() to this list is a fine interim solution.

packages/malloy/src/dialect/standardsql.ts:

export class StandardSQLDialect extends Dialect {
  name = "standardsql";
  defaultNumberType = "FLOAT64";

  ...

  functionInfo: Record<string, FunctionInfo> = {
    timestamp_seconds: { returnType: "timestamp" },
  };
whscullin commented 2 years ago

The first issue is fixed, I'm not sure how to address the latter issue - it's a limitation of our vocabulary. We only support ::number, which the database interprets as a decimal value, if we supported ::integer then the concat would work as expected, but without language support for integers, particularly with JSON, it's a bit suspect.