h3rald / min

A small but practical concatenative programming language and shell
https://min-lang.org
MIT License
309 stars 23 forks source link

Native dictionary data type #32

Closed h3rald closed 6 years ago

h3rald commented 6 years ago

min should provide a native dictionary data type. This will likely improve performance and also make the language richer and easier to use.

mwgkgk commented 6 years ago

For what it's worth I use dicts in their current form a lot, and love them. The structure is transparent and easy to build up from a list literal in the repl. I do not however do anything that would depend on them being specifically lists rather than something opaque.

I imagine lisp people find the current form to be of elegant design, and consistent with the language. It is also understandable that hash dicts can be a performance bottleneck and languages with fast dict lookup are awesome.

:purple_heart:

Edit: Didn't realize this comes as a consequence of the benchmark effort! Got confused there for a minute :)

h3rald commented 6 years ago

Believe me I didn't really want to implement a separate data type for dictionaries, but there are a few problems with the current design:

Not sure how I could speed up lookup with the current design... I could experiment with saving all lists to hash tables of some kind, but this is likely to cause even more problems.

Alternatively, we could have a new data type and keep it enumerable as a list as well.

For the notation, I am undecided between a JSON-like notation or something more minimal like this:

 ("a" : 1, "b": 2) ; dictionary using string keys
 ('c : 3, 'd : 4) ; dictionary using symbol keys
 (:) ; empty dictionary

Now... unfortunately : is also used to define symbols, so the above won't work really.

Maybe something hybrid and more concatenative like:

 {1 :a, 2 :b}

That could be considered a code block as well.

Gotta think about it really. It's gonna cause some compatibility breaks anyway.

Any thoughts? Proposals?

mwgkgk commented 6 years ago

Cixl uses comma to construct pairs: https://github.com/basic-gongfu/cixl#pairs - that underlie the implementation of tables, which are constructed explicitly with Table(...).

Curly brace syntax I think is a great idea. It could even work without commas: {1 :a 2 :b} - this is a good translation of something like clojure's {:a 1, :b 2}, with additional bonus of :a and :b actually sort of doing the same thing as in regular code. Hashmaps = local-scope variable lookup? Wow.

h3rald commented 6 years ago

Oh yes! Commas are not needed, you're right.

Now, the only problem is that although the define sigil syntax looks really cool,it cannot be used (currently at least) to define keys that contain spaces (in JSON is infrequent but allowed) or certain other characters.

We might have to settle for something like this:

{1 "key a" 2 'keyB}

or we could introduce a grouping syntax for sigils, which wouldn't be too bad I guess, i.e.:

 {1 :[key a] 2 :keyB}

A bit heavy but it could be useful in other contexts.

Still not sure on using the colon though... it looks cool but it doesn't quite make sense, unless we allow things like this as well:

 {1 "key a" define 2 "keyB" define}

Dictionaries could therefore be intended as code blocks that are immediately evaluated but they remain in their own scope (and retain their type}. Which means we could write:

 {1 :[key a] 1 1 + "keyB" define} :test1
 ; test1 is set to {1 :[key a] 2 :keyB}

And then:

 test1 /[key a] 
 ; Evaluates to 1

In this case dget could be used to retrieved symbols defined in an inner scope.

Wow OK, even more confusing, just brainstorming here...

mwgkgk commented 6 years ago

Elixir quotes the atoms that have whitespace:

> is_atom(:atom_without_whitespace)
true
> is_atom(:"atom with whitespace")
true
> is_atom(:whitespace error)
** (SyntaxError) iex:3: syntax error before: whitespace

Allowing evaluation within dicts is very convenient and I think in spirit of the language. It should still allow making a dictionary of function references (quoted symbols in this case) without flattening it to the ground which probably would work with simple quoting?

> {(+) :first_function (-) :second_function} :our_dict
> 3 2 our_dict /first_function
5

Also just noticed that unlike quotes we're using a colon to define dicts. It's a common mistake for me to first try capture a quote with a colon then realizing it should be done with the equals sign or else it would dequote like a function. Since dicts evaluate in place, colon is more fitting, as it is more like an object than like a quote.

Evaluation within dicts is I think synonymous with dict initialization in any language. E.g. python:

> our_dict = {"first_value" : 3 + 4}
> our_dict["first_value"]
7
h3rald commented 6 years ago

Again, good points!

I agree on allowing quotations to be assigned as values, however at this point it would make more sense to keep using quote-defines (or = sigil) where appropriate:

 {(+) =add ((*)) :multiply}

If we just say that brackets are only used to separate a scope that then becomes accessible from outside via dget, dset etc., then one could also put any symbols in it.

The following therefore:

  {(+) =add (-) =multiply (*) #multiply}

Would still evaluate, like the previous program, to:

 {((+)) :add ((*)) :multiply}

...which the parser would have to re-create as the default representation of a dictionary (essentially list definitions of all the variables defined in it).

Anything better than this? :)

mwgkgk commented 6 years ago

If we just say that brackets are only used to separate a scope that then becomes accessible from outside via dget, dset etc., then one could also put any symbols in it.

Yep, that's right, I didn't realize this at first. For all purposes the define's inside behave like any other. Quote-define is doing sort of a different thing in regards to my

> {(+) :first_function (-) :second_function} :our_dict
> 3 2 our_dict /first_function
5
> {(+) =first_function (-) =second_function} :our_dict
> 3 2 our_dict /first_function
+

example, but it does not matter since both ways are allowed.

It seems like there could be some gotchas to find, but this looks like an elegant and maybe unique way to go about this.

h3rald commented 6 years ago

...and then there's the whole module business.

At present, you can define a module like this:

 ((+) :plus (-) :minus) +mymodule

Now, mymodule itself evaluates to an empty quotation (when the module is defined, that is) which also defines a scope and contains two symbols. We can access symbols inside a module using call and we can import all the symbols within a module into the current scope using import.

I never actually liked this whole special empty quotation business... perhaps we could rewrite the above like this:

 {(+) :plus (-) :minus} :mymodule

(and it would also have a readable representation of its contents)

Because basically, at the end of the day, modules are dictionaries as well, and we wouldn't need a special digli and symbol module to evaluate the quotation and bind it to a symbol: our new dictionaries are evaluated automatically when created, and can be bound to a symbol without the need of a special keyword.

Now... we're talking a few compatibility breaks and removals of some symbols, also because in this case call wouldn't be necessary anymore, as you'd simply use dget.

So... how do we call these things? Modules or dictionaries? I am really not sure. I'd just go for dictionaries, but if we do we would have to talk about lang, seq, and fs dictionaries, while it's clearer if we call them modules.

Or we could go for another term altogether, but not sure what (scope could be a possibility, but also library or object).

Finally... there would still be a concept of special dictionaries like request and response in the http module... but it's better than having special empty quotes anyway...

mwgkgk commented 6 years ago

modules are dictionaries as well

Wow.

import all the symbols within a module into the current scope using import.

That's something that never really happens with dicts and when looking for a reason to keep them separate this might be one.

I think separate syntax, even with same underlying tech, has it's upsides in communicating the intent / not confusing the users. It can be revealed in docs for extra mind expanding effect, but using dget for call seems odd. (Honestly I haven't used modules, importing the whole file in local scope instead: there's no examples for call in the docs and I somehow missed the point that it can help with scoping. Shame.)

Also modules can potentially have private definitions down the road, to manage complexity. As well as importing some other modules and not reexporting (which means definitions should be private by default). What happens if we import a module within a dict? Well that could be a fun way to concatenate dicts :)

I wonder why you went with ^my_function rather than .my_function :)

Still, with all that said, while I think modules being a thin-ish layer over dicts is closer to optimal, merging them is not that far off. This is definitely an interesting choice. Basically these guys are "eager quotes" while regular quotes are "lazy quotes". We haskell now.

Also, a thing that I recently became aware of is that in Forth square brackets can denote compile time evaluation: https://rosettacode.org/wiki/Compile-time_calculation#Forth. Which is interesting because that'd be the third possible use of different brackets that we have: lazy, eager, and compile-time, if min ever gets a compile-time.

Not having two separate syntaxes for "eager quotes" can definitely be an upside.

Edit: "Eager quote" is a stretch, because it's an eager-and-capturing quote. "Eager scope" is more like it I suppose.

h3rald commented 6 years ago

OK I see your point. Let's have separate syntax, and probably modules and dictionaries could be separate types with separate properties as well, internally.

The lazy vs. eager scope discriminant actually makes a lot of sense as well, that definition could help in the docs too!

Thank you so much for all your inputs and also for the comparisons with all the other programming languages - I knew of some of the things you mentioned, but as I didn't properly try all the languages you mentioned I was glad to learn something new.

Oh well, I guess now it's a good time to start with the implementation then! :)

h3rald commented 6 years ago

Well, release 0.17.0 is out!

Quite a few breaking changes, but hopefully worth it. Thanks @mwgkgk for all your help!

mwgkgk commented 6 years ago

Very cool!