Open andrevidela opened 4 months ago
I think the only bit of this that could provide opposing intuitions is pattern matching. One intuition is that the data structure is built inductively like everything else and therefore matching on one element is (arbitrarily) order-dependent. The other intuition is that dictionaries are not (generally) order dependent so pattern matching on one element means “this element is in the dictionary.” The latter is incorrect, of course.
This probably isn’t a problem that outweighs the benefits of pattern matching with the new syntax I don’t think, but it certainly is a consideration not shared by Lists.
Thanks for starting this in a well-thought out way. My thoughts:
A dictionary will typically be unique in its "keys". I can't think how that would be possible with a recursive data type and pattern matching on data constructors. For example, what would this be ["1" := 1, "1" := 2]
? Would you get access to both "1"
s for DCons k v (DCons k' v' _)
? Or would one just use DCons
as a function rather than a constructor here, and not allow pattern matching? Like
export -- not public
data Map k v = ?whatever
export
DCons : k -> v -> Map k v -> Map k v
DCons = -- overwrite or add k in Map
I see no problem with this, as long as it's intentional.
@joelberkeley Thanks for your comments. For your concerns about 2 and 3, please refer to the "Scope" section:
The syntax sugar does not help with indexing dictionaries, dictionary update, or performance. This proposal is only about syntax sugar to write terms that match key-value pair semantics.
Uniqueness of keys and dictionary indexing remains unchanged and is left as an implementation choice of the library writer.
@joelberkeley You can use induction-recursion to define a dictionary together with a notion of key freshness and insist that all new keys must be fresh.
import Data.So
record Distinct (x, y : String) where
constructor MkDistinct
runDistinct : So (x /= y)
data FreshDict : Type -> Type
isFresh : String -> FreshDict a -> Type
data FreshDict where
DNil : FreshDict a
DCons : (key : String) -> (value : a) ->
(dict : FreshDict a) ->
{auto 0 fresh : isFresh key dict} ->
FreshDict a
isFresh x DNil = ()
isFresh x (DCons y _ dict) = (Distinct x y, isFresh x dict)
failing #"Can't find an implementation for Distinct "1" "1"."#
test : FreshDict Nat
test = DCons "1" 1
$ DCons "1" 2
$ DNil
Uniqueness of keys and dictionary indexing remains unchanged and is left as an implementation choice of the library writer.
Indeed, I mainly want to check if this functionality will lend itself well to these situations, esp when they strike me as a particularly common use-case.
I like the alternative considered (using a pairing operator), I'm not so sure about the argument to reject it.
Would defining this syntax (::=
, say, for concreteness) in a standard library implementation for dictionaries achieve the same goal without complicating the parser/desugaring/elaborator further?
Including it in a stdlib would signal it is the preferred/conventional syntax. OTOH, implementing a specialised desugaring/elaboration will not help us avoid the proliferation of user-defined syntax for dictionaries.
Maybe there are other arguments against the pairing operator approach that would strengthen this proposal?
I think the pairing operator is a very strong alternative, and it has two great benefits over this proposal:
Beside the subjective aesthetic difference, dictionary syntax has two benefits over operators + lists:
[]
as an empty list and []
as an empty dictionary introduces ambiguity when mixed as constructors for the same type like JSON:=
is a reserved symbol, it cannot be misused. It also gives us the opportunity to design bespoke error messages related to dictionaries usage.On this last point, providing a preferred operator in the Prelude will not prevent users from using it for non-dictionary purposes or redefining it entirely, which I could qualify as "misuse".
Summary
Develop new syntax to write dictionaries inspired from the list syntax sugar:
[:=]
[k := v]
[ k1 := v1, k2 := v2, ...]
DCons k v ds
Motivation
Contemporary programming involves manipulation a common set of data structures, like Lists, Strings, and numbers. One of them is dictionaries.
I will call Dictionary the general category of data structure that involve a key-value pair. In industry, dictionaries are seen as an efficient key-value storage that indexes the value in
O(Log(n))
complexity. In idris, this datastructure isSortedMap
.In reality, there are many forms of key-value pair datastructures that do not fulfill this performance promise but we would still like to use key-value pair syntax to represent them those use-cases are:
List (a, b)
, lists of pairsBecause of their ubiquity, all those dictionary types should enjoy an easily accessible and familiar syntax. This is further evidenced by other languages providing syntax sugar specifically for dictionary types. Like we find in python
{a : b}
, Swift[a: b]
and Ruby{a => b}
The proposal
The dictionary syntax is developed as a desugaring step similar to the one used for lists. To recap the list desugaring: Lists
[a, b, c]
are desugared into successive application of the constructor::
(pronounced "cons") andNil
. Such that[a, b, c]
desugars toa :: (b :: (c :: Nil))
.The proposal is to allow the parser to accept syntax of the form
[ a := b , c := d ]
. This is meant to look like a list where terms on the left of:=
as assigned values to the right of it. Another way to understand this syntax is to thing of it as record update but using square brackets as enclosing symbols, rather than curly brackets.Scope
The syntax sugar does not help with indexing dictionaries, dictionary update, or performance. This proposal is only about syntax sugar to write terms that match key-value pair semantics. They should allow pattern matching but not develop new pattern matching semantics to avoid introducing complexity in case trees or in learning curve.
Examples
Empty dictionary:
[:=]
Desugars to:DNil
Singleton dictionary:
[ key := val ]
Desugars to:DCons key val DNil
Multiple key-values dictionary:
[ k1 := v1 , k2 := v2 ]
Desugars to:DCons k1 v1 (DCons k2 v2 DNil)
Pattern matching:
How to use it
Currently the constructor for JSON is
Object : List (String, JSON) -> JSON
and so writing JSON values looks like this:There are two problem, we always need to nest
Object
andArray
and the second is that we use the full pair syntax for key-value pairs in objects. We can fix theArray
andObject
nesting by providing a suitable::
function, but we cannot remove the parenthesis nesting without introducing a bespoke operator for pairing.Instead, the above can be written in direct style by designing a DSL for objects-as-dictionaries:
The JSON DSL is implemented in the tests
Complex record libraries using a schema as index can be update to support the new dictionary syntax as demonstrated here: https://github.com/andrevidela/idris-records/blob/dictionary-syntax/src/Records.idr#L78-L79
Technical implementation
This feature is implemented as a desugaring step where the parser is extended to parse lists of pairs of expressions separated by
:=
in the same way as record updates but using square brackets instead.A prototype implementation is here where you can see that the diff is fairly small and the changes to the syntax are also small since all the components already exist. It can be found in the draft #3356 .
Alternatives considered
Use
[]
for empty dictionaryThis does work but it creates unwelcome ambiguities when mixing lists and dictionaries in the same data.
For example given a value of type
JSON
we cannot know if we are building an empty array or an empty object if we writeand
Nil
stands both for empty dictionary and empty array.To remove this ambiguity, I propose to use
[:=]
as a reserved token to refer to empty dictionaries.We could use
[:]
or[=]
which better mimic[>]
and[<]
but they do not reflect reflect the syntax used for dictionaries.Same notation, old constructors
Another option would be to desugar
[k := v]
intoMkPair k v :: []
instead ofDCons k v []
.This would work for the most common case but falls apart whenever there is a type dependency between the key and the value, or the key-value pair and the rest of the data. For example, we could not implement the indexed data :
This is because the constructor for the key/value pair is dependent, and
MkPair
is not.Use a pairing operator
This problem already has a solution: Use list syntax with a custom operator to pair-up key-values. If the pairing operator is called
::=
the json example can be written:Scala uses
->
as a pairing operator:Haskell's aeson uses
.=
for pairing:Other Idris libraries currently use
::=
,:>
,=:
,:=:
:->:
and more.The shortcoming of this approach is obvious by now: there is absolutely no uniformity when building key-value pairs in Idris. However, in the absence of a standard pairing operator, using
:=
would render all uses of dictionaries uniform, decreasing the cognitive overhead of learning a new operator for each bespoke dictionary type.Out of this, stems two counter points:
,
The current standard pairing operator is the
Pair
type, constructed in its type, and term, with the syntax(_, _)
. One can use tuple sections to use it as a function (,). We can also use the namePair
to build the type orMkPair
to build terms of the aforementioned type.The problem becomes apparent when attempting to use the standard constructor for pair to build dictionaries: It is verbose, repetitive, and noisy:
While only makes use of prelude constructors, it is not a good user experience to type that much boilerplate. This argument is supported by the fact that library designers have been deploying their own pairing operators to make constructing dictionaries easier.
The second argument is strongest when applied with domain-specific knowledge. There are uses of dictionaries where the use of
:=
would be misleading and generally worse than bespoke operators. For example a typing context is itself a form of dictionary, holding variables as keys and types as values, but it is very much expected that we write them with a bespoke operator to represent the fact that a variable is assigned a type, for example[ x :- t ]
.The presence of this feature does not forbid the use of custom operators for such purposes, and in fact, is completely backward compatible.
Do nothing
One last thing to consider is to do nothing. Other successful languages such as Java Haskell or Rust do not have a bespoke dictionary syntax. This is clear evidence that such feature is not necessary for wider adoption and its maintenance cost might exceed the overall usage benefits we get from it. However, I personally think that the above argument about uniformity are enough to justify the additional feature. As the prototype shows, the diff for it is quite small, rendering the maintenance cost minimal, see #3356.
Conclusion
Dictionary syntax solidifies Idris as a modern programming language, enjoying the same ergonomics as other programming environments that focus on user-friendliness and accessibility. What's more, enabling a uniform syntax that accommodates for dictionaries, JSON, XML, dependent types all at once represents an unprecedented opportunity to lead the way in programming language experience.