malloydata / malloy

Malloy is an experimental language for describing data relationships and transformations.
http://www.malloydata.dev
MIT License
1.92k stars 75 forks source link

custom functions #490

Open anikaks opened 2 years ago

anikaks commented 2 years ago

The ability to define custom functions in Malloy would be super powerful.

Here's one example, provided by Mike on Slack:

I think this is a feature request. It would be nice to be able to define a "function" that can be applied in dimension: and measure:. Let me give you an example. In a few places we have a timezone field but what we really want is region. So we wrote a pick statement like below:

dimension: requester_region is users.time_zone:
pick 'North America' when 'Pacific Time (US & Canada)'
pick 'North America' when 'Eastern Time (US & Canada)'
pick 'North America' when 'Central Time (US & Canada)'
pick 'North America' when 'Mountain Time (US & Canada)'
pick 'North America' when 'Atlantic Time (Canada)'
pick 'North America' when 'Arizona'
pick 'North America' when 'Puerto Rico'
pick 'North America' when 'Hawaii'
pick 'North America' when 'Indiana (East)'
pick 'North America' when 'Saskatchewan'
pick 'Israel' when 'Jerusalem'
pick 'Europe' when 'Berlin'
pick 'Europe' when 'Paris'
pick 'Europe' when 'Amsterdam'
pick 'Europe' when 'Bern'
pick 'Europe' when 'Kyiv'
pick 'Europe' when 'Warsaw'
pick 'Europe' when 'Athens'
pick 'Europe' when 'Stockholm'
pick 'Europe' when 'Madrid'
pick 'Europe' when 'Bucharest'
pick 'Europe' when 'Copenhagen'
pick 'Europe' when 'Prague'
pick 'Europe' when 'Budapest'
pick 'Europe' when 'Vilnius'
pick 'Europe' when 'Vienna'
pick 'Europe' when 'Sofia'
pick 'Europe' when 'Lisbon'
pick 'Europe' when 'Rome'
pick 'Europe' when 'Helsinki'
pick 'Europe' when 'Ljubljana'
pick 'Europe' when 'Belgrade'
pick 'Europe' when 'Skopje'
pick 'Europe' when 'Brussels'
pick 'Europe' when 'Zagreb'
pick 'Europe' when 'Tallinn'
pick 'Europe' when 'Bratislava'
pick 'Europe' when 'Riga'
pick 'Europe' when 'Tbilisi'
pick 'Europe' when 'Sarajevo'
pick 'Europe' when 'Zurich'
pick 'Europe' when 'London'
pick 'Europe' when 'Edinburgh'
pick 'Europe' when 'Dublin'
pick 'Australia' when 'Melbourne'
pick 'Australia' when 'Sydney'
pick 'Australia' when 'Brisbane'
pick 'Australia' when 'Canberra'
pick 'Australia' when 'Perth'
pick 'Australia' when 'Adelaide'
pick 'Australia' when 'Melbourne'
pick 'New Zealand' when 'Auckland'
pick 'Central & South America' when 'Bogota'
pick 'Central & South America' when 'Brasilia'
pick 'Central & South America' when 'Central America'
pick 'Central & South America' when 'Guadalajara'
pick 'Central & South America' when 'Santiago'
pick 'Central & South America' when 'Caracas'
pick 'Central & South America' when 'La Paz'
pick 'Central & South America' when 'Tijuana'
pick 'Central & South America' when 'Mazatlan'
pick 'Central & South America' when 'Montevideo'
pick 'Central & South America' when 'Buenos Aires'
pick 'Central & South America' when 'Lima'
pick 'Middle East & Africa' when 'Pretoria'
pick 'Middle East & Africa' when 'Baghdad'
pick 'Middle East & Africa' when 'Istanbul'
pick 'Middle East & Africa' when 'Abu Dhabi'
pick 'Middle East & Africa' when 'Islamabad'
pick 'Middle East & Africa' when 'Nairobi'
pick 'Middle East & Africa' when 'Cairo'
pick 'Middle East & Africa' when 'Yerevan'
pick 'Middle East & Africa' when 'Riyadh'
pick 'Middle East & Africa' when 'Tashkent'
pick 'Middle East & Africa' when 'Tehran'
pick 'Middle East & Africa' when 'Almaty'
pick 'Middle East & Africa' when 'Casablanca'
pick 'Asia' when 'Sri Jayawardenepura'
pick 'Asia' when 'Bangkok'
pick 'Asia' when 'Osaka'
pick 'Asia' when 'Singapore'
pick 'Asia' when 'Kathmandu'
pick 'Asia' when 'Seoul'
pick 'Asia' when 'Kuala Lumpur'
pick 'China' when 'Beijing'
pick 'India' when 'Chennai'
pick 'Russia' when 'Moscow'
pick 'Russia' when 'Minsk'
pick 'Russia' when 'St. Petersburg'
pick 'Russia' when 'Samara'
pick 'Russia' when 'Ekaterinburg'
pick 'Russia' when 'Novosibirsk'
else users.time_zone

The only way to reuse that is to copy and paste it. If we could create a function and pass the column it could do the same compilation it currently does, but allows us to reuse the single pick statement

mtoy-googly-moogly commented 2 years ago

Agree we need functions. Agree this could be a function. If I could write the feature for THIS user, though, I would implement some way to create user defined data sets.

maybe something stupid like

csv: timeZoneMap is csv(region_name,timezone_name) <<
  'North America','Pacific Time (US & Canada)',
...
>>

source: myThing is table('thing') + {
  join_one: timeZoneMap on region = timeZoneMap.regionName
}

where we would actually generate table ...

or maybe an actual map data type ...

map: tzMap is map<string,string> = {
  'North America' : 'Pacific Time (US & Canada)',
   ...
}
source: myThing is table('thing') + {
  dimension: timeZone is tsMap[region]
}

where we might generate that case statement

or maybe something else and we might decide to generate different SQL depending on how you use it?

lloydtabb commented 2 years ago

So I think functions aren't really the right thing. I'm more of a bare join that takes parameters as input. We're going to want code that declares multiple things from the same input (think of a retail analysis package that computes gross margin, cogs, etc from an orders/items table)...

abstract_source: timezone_to_string (input_field) {
   dimension: timezone_string is @input_field:
      pick 'North America' when 'Pacific Time (US & Canada)'
      pick 'North America' when 'Eastern Time (US & Canada)' 
}

source: my_tabe is table(...)  {
  // creates a user_time name space like a join.
   extend: user_time is timezone_to_string(user.timezone)

  query: foo is {
    group_by: user_time.timezone_string
  }
}
lloydtabb commented 2 years ago

agree with Michael...

You could also do this today as:

sql: timezone_map_sql is ||
  SELECT * FROM UNNEST([
    STRUCT('North America' as region, 'Pacific Time (US & Canada)' as zone_name),
    STRUCT('North America', 'Central Time (US & Canada)')
  -- ...
  ]) 
;; 
source: foo ... {
  join_one: user_region is from_sql(timezone_map_sql) on users.timezone = user_region.zone_name

}
lloydtabb commented 2 years ago

@anikaks ^^

anikaks commented 2 years ago

I'll share this! Maybe not the ideal example for this request; I'm still very interested in functions !

mroyle commented 2 years ago

I actually looked deeper at the code and realized there was no reason to duplicate this particular example. Ultimately, the same table was joined multiple times and that's where the above code was applied. I moved it to the original table and then there was no reason to duplicate it.

However, I still think some kind of function would be useful