idris-lang / Idris2

A purely functional programming language with first class types
https://idris-lang.org/
Other
2.49k stars 373 forks source link

Syntax for Dictionary types #3355

Open andrevidela opened 1 month ago

andrevidela commented 1 month ago

Summary

Develop new syntax to write dictionaries inspired from the list syntax sugar:

Motivation

Contemporary programming involves manipulation a common set of data structures, like Lists, Strings, and numbers. One of them is dictionaries.

I will call Dictionary the general category of data structure that involve a key-value pair. In industry, dictionaries are seen as an efficient key-value storage that indexes the value in O(Log(n)) complexity. In idris, this datastructure is SortedMap.

In reality, there are many forms of key-value pair datastructures that do not fulfill this performance promise but we would still like to use key-value pair syntax to represent them those use-cases are:

Because of their ubiquity, all those dictionary types should enjoy an easily accessible and familiar syntax. This is further evidenced by other languages providing syntax sugar specifically for dictionary types. Like we find in python {a : b}, Swift [a: b] and Ruby {a => b}

The proposal

The dictionary syntax is developed as a desugaring step similar to the one used for lists. To recap the list desugaring: Lists [a, b, c] are desugared into successive application of the constructor :: (pronounced "cons") and Nil. Such that [a, b, c] desugars to a :: (b :: (c :: Nil)).

The proposal is to allow the parser to accept syntax of the form [ a := b , c := d ]. This is meant to look like a list where terms on the left of := as assigned values to the right of it. Another way to understand this syntax is to thing of it as record update but using square brackets as enclosing symbols, rather than curly brackets.

Scope

The syntax sugar does not help with indexing dictionaries, dictionary update, or performance. This proposal is only about syntax sugar to write terms that match key-value pair semantics. They should allow pattern matching but not develop new pattern matching semantics to avoid introducing complexity in case trees or in learning curve.

Examples

Empty dictionary: [:=] Desugars to: DNil

Singleton dictionary: [ key := val ] Desugars to: DCons key val DNil

Multiple key-values dictionary: [ k1 := v1 , k2 := v2 ] Desugars to: DCons k1 v1 (DCons k2 v2 DNil)

Pattern matching:

 case dict of
     -- match on empty dictionary
     [:=] => ...
     -- match on singleton dictionary 
     [ key := val ] => ...
     -- match on dictionary with two keys
     [ key1 := val1, key2 := val2 ] => ...
     -- match on at least one key
     DCons k v ds => ...

How to use it

Currently the constructor for JSON is Object : List (String, JSON) -> JSON and so writing JSON values looks like this:

value : JSON
value = Object [ ("key1", "value1") 
               , ("key2", Array ["value2", 3, "value4"])]
               ]

There are two problem, we always need to nest Object and Array and the second is that we use the full pair syntax for key-value pairs in objects. We can fix the Arrayand Object nesting by providing a suitable :: function, but we cannot remove the parenthesis nesting without introducing a bespoke operator for pairing.

Instead, the above can be written in direct style by designing a DSL for objects-as-dictionaries:

value : JSON
value = [ "key1" := "value1"
        , "Key2" := ["value3", 3, "value4"]
        ]

The JSON DSL is implemented in the tests

Complex record libraries using a schema as index can be update to support the new dictionary syntax as demonstrated here: https://github.com/andrevidela/idris-records/blob/dictionary-syntax/src/Records.idr#L78-L79

Technical implementation

This feature is implemented as a desugaring step where the parser is extended to parse lists of pairs of expressions separated by := in the same way as record updates but using square brackets instead.

A prototype implementation is here where you can see that the diff is fairly small and the changes to the syntax are also small since all the components already exist. It can be found in the draft #3356 .

Alternatives considered

Use [] for empty dictionary

This does work but it creates unwelcome ambiguities when mixing lists and dictionaries in the same data.

For example given a value of type JSON we cannot know if we are building an empty array or an empty object if we write

empty : JSON
empty = []

and Nil stands both for empty dictionary and empty array.

To remove this ambiguity, I propose to use [:=] as a reserved token to refer to empty dictionaries.

We could use [:] or [=] which better mimic [>] and [<] but they do not reflect reflect the syntax used for dictionaries.

Same notation, old constructors

Another option would be to desugar [k := v] into MkPair k v :: [] instead of DCons k v [].

This would work for the most common case but falls apart whenever there is a type dependency between the key and the value, or the key-value pair and the rest of the data. For example, we could not implement the indexed data :

-- A map guaranteed to contain the keys in the index.
data InMap : List String ->Type where
  Nil : InMap []
  Cons : (idx : String ** Elem idx ls) -> InMap xs -> InMap ls

This is because the constructor for the key/value pair is dependent, and MkPair is not.

Use a pairing operator

This problem already has a solution: Use list syntax with a custom operator to pair-up key-values. If the pairing operator is called ::= the json example can be written:

user : JSON
user = Object [
  "name" ::= "Annie",
  "age" ::= 8,
  "power" ::= "fire"
  ]

Scala uses -> as a pairing operator:

val d = Map("name" -> "Annie", 
            "age" -> 8, 
            "power" -> "fire")

Haskell's aeson uses .= for pairing:

user : JSON
user = [ "name" .= "Annie"
       , "age" .= 8
       , "power" .= "fire"]

Other Idris libraries currently use ::=, :>, =:, :=: :->:and more.

The shortcoming of this approach is obvious by now: there is absolutely no uniformity when building key-value pairs in Idris. However, in the absence of a standard pairing operator, using := would render all uses of dictionaries uniform, decreasing the cognitive overhead of learning a new operator for each bespoke dictionary type.

Out of this, stems two counter points:

The current standard pairing operator is the Pair type, constructed in its type, and term, with the syntax (_, _). One can use tuple sections to use it as a function (,). We can also use the name Pair to build the type or MkPair to build terms of the aforementioned type.

The problem becomes apparent when attempting to use the standard constructor for pair to build dictionaries: It is verbose, repetitive, and noisy:

user : JSON
user = [ ("name", "Annie")
       , ("age", 8)
       , ("power", "fire")
       ]

While only makes use of prelude constructors, it is not a good user experience to type that much boilerplate. This argument is supported by the fact that library designers have been deploying their own pairing operators to make constructing dictionaries easier.

The second argument is strongest when applied with domain-specific knowledge. There are uses of dictionaries where the use of := would be misleading and generally worse than bespoke operators. For example a typing context is itself a form of dictionary, holding variables as keys and types as values, but it is very much expected that we write them with a bespoke operator to represent the fact that a variable is assigned a type, for example [ x :- t ].

The presence of this feature does not forbid the use of custom operators for such purposes, and in fact, is completely backward compatible.

Do nothing

One last thing to consider is to do nothing. Other successful languages such as Java Haskell or Rust do not have a bespoke dictionary syntax. This is clear evidence that such feature is not necessary for wider adoption and its maintenance cost might exceed the overall usage benefits we get from it. However, I personally think that the above argument about uniformity are enough to justify the additional feature. As the prototype shows, the diff for it is quite small, rendering the maintenance cost minimal, see #3356.

Conclusion

Dictionary syntax solidifies Idris as a modern programming language, enjoying the same ergonomics as other programming environments that focus on user-friendliness and accessibility. What's more, enabling a uniform syntax that accommodates for dictionaries, JSON, XML, dependent types all at once represents an unprecedented opportunity to lead the way in programming language experience.

mattpolzin commented 1 month ago

I think the only bit of this that could provide opposing intuitions is pattern matching. One intuition is that the data structure is built inductively like everything else and therefore matching on one element is (arbitrarily) order-dependent. The other intuition is that dictionaries are not (generally) order dependent so pattern matching on one element means “this element is in the dictionary.” The latter is incorrect, of course.

This probably isn’t a problem that outweighs the benefits of pattern matching with the new syntax I don’t think, but it certainly is a consideration not shared by Lists.

joelberkeley commented 1 month ago

Thanks for starting this in a well-thought out way. My thoughts:

  1. It might be confusing to have two uses of := which mean different things. One being assignment, the other being pairs. Particularly from the perspective of readability.
  2. A dictionary will typically be unique in its "keys". I can't think how that would be possible with a recursive data type and pattern matching on data constructors. For example, what would this be ["1" := 1, "1" := 2]? Would you get access to both "1"s for DCons k v (DCons k' v' _)? Or would one just use DCons as a function rather than a constructor here, and not allow pattern matching? Like

    export  -- not public
    data Map k v = ?whatever
    
    export
    DCons : k -> v -> Map k v -> Map k v
    DCons = -- overwrite or add k in Map

    I see no problem with this, as long as it's intentional.

  3. Does this allow for O(1) access/update hashmaps and in-place mutable maps, like stefan's linear arrays?
andrevidela commented 1 month ago

@joelberkeley Thanks for your comments. For your concerns about 2 and 3, please refer to the "Scope" section:

The syntax sugar does not help with indexing dictionaries, dictionary update, or performance. This proposal is only about syntax sugar to write terms that match key-value pair semantics.

Uniqueness of keys and dictionary indexing remains unchanged and is left as an implementation choice of the library writer.

gallais commented 1 month ago

@joelberkeley You can use induction-recursion to define a dictionary together with a notion of key freshness and insist that all new keys must be fresh.

import Data.So

record Distinct (x, y : String) where
  constructor MkDistinct
  runDistinct : So (x /= y)

data FreshDict : Type -> Type
isFresh : String -> FreshDict a -> Type

data FreshDict where
  DNil : FreshDict a
  DCons : (key : String) -> (value : a) ->
          (dict : FreshDict a) ->
          {auto 0 fresh : isFresh key dict} ->
          FreshDict a

isFresh x DNil = ()
isFresh x (DCons y _ dict) = (Distinct x y, isFresh x dict)

failing #"Can't find an implementation for Distinct "1" "1"."#

  test : FreshDict Nat
  test = DCons "1" 1
       $ DCons "1" 2
       $ DNil
joelberkeley commented 1 month ago

Uniqueness of keys and dictionary indexing remains unchanged and is left as an implementation choice of the library writer.

Indeed, I mainly want to check if this functionality will lend itself well to these situations, esp when they strike me as a particularly common use-case.

ohad commented 1 month ago

I like the alternative considered (using a pairing operator), I'm not so sure about the argument to reject it.

Would defining this syntax (::=, say, for concreteness) in a standard library implementation for dictionaries achieve the same goal without complicating the parser/desugaring/elaborator further?

Including it in a stdlib would signal it is the preferred/conventional syntax. OTOH, implementing a specialised desugaring/elaboration will not help us avoid the proliferation of user-defined syntax for dictionaries.

Maybe there are other arguments against the pairing operator approach that would strengthen this proposal?

andrevidela commented 1 month ago

I think the pairing operator is a very strong alternative, and it has two great benefits over this proposal:

Beside the subjective aesthetic difference, dictionary syntax has two benefits over operators + lists:

On this last point, providing a preferred operator in the Prelude will not prevent users from using it for non-dictionary purposes or redefining it entirely, which I could qualify as "misuse".