Add pattern matching to the std

jmgomez commented 8 months ago

Abstract

The intention of this RFC is to measure the interest of adding a pattern matching library in the std. Many people complained in the past of "Nim not having pattern matching". This complain will go away if the std ships with one. It will also drive the improvement as more people will start using it.

Motivation

No response

Description

While not perfect, fusion/matching or std/matching often allows for cleaner, more readable code especially when working with monodic types like option, either/result. etc.

Also one of the key point of fusion is "It contains candidates for inclusion into the stdlib"

Code Examples

From tmatching.nim These are just a few examples, I encourage looking at: https://github.com/nim-lang/fusion/blob/master/tests/tmatching.nim


  if (Some(@x) ?= some("hello")) and
       (Some(@y) ?= some("world")):
      assertEq x, "hello"
      assertEq y, "world"

 case [3]:
      of [{2, 3}]: discard
      else: testFail()

 case (a: 12, c: 90):
      of (a: 12 | 90, c: _): "matched"
      else: "not matched"

 case Obj2():
      of (kind: in {enEE, enEE1}): discard
      else: testFail()

Update: Adds code examples.

awr1 commented 8 months ago

isn't this a "people aren't using fusion" problem?

kunitoki commented 8 months ago

isn't this a "people aren't using fusion" problem?

Why should i use fusion, it's a bucket of random stuff that i don't even know how supported, maintained and tested is against nim versions i'm using

arkanoid87 commented 8 months ago

isn't this a "people aren't using fusion" problem?

Why should i use fusion, it's a bucket of random stuff that i don't even know how supported, maintained and tested is against nim versions i'm using

Fusion aims to be an immutable code repository, once a module is included it stays. Like Nim itself, Fusion uses a .since annotation. New modules which contain few procs are preferred over larger modules that have more procs.
Fusion is also a Nimble package and it is compatible with Nim version 1 as well as the latest Nim.

moigagoo commented 8 months ago

fusion/matching or std/matching often allows for cleaner, more readable code

Could you please put a couple of examples to demonstrate that? Like, here's a code sample that does X without pattern matching and here's one with.

I think a good example will help people see the potential on this PR and more likely vote for it.

konsumlamm commented 8 months ago

There's already another RFC for adding pattern matching: https://github.com/nim-lang/RFCs/issues/525. fusion/matching in particular is outdated (https://github.com/haxscramper/hmatching is a more updated version). TBH I'm not quite happy with some of its syntax (e.g. using @ for bindings, object patterns don't specify the type, special case until/all/some), so I'm not in favour of just putting it in the standard library as is, since that means we can't really change it anymore (although I'm certainly in favour of pattern matching in general).

arnetheduck commented 8 months ago

As usual in these cases, the better path for pretty much the whole community is to remove it from fusion, use a separate package and recommend it in the manual - this allows all the advantages that separate package management brings: own / faster release cycle, multiple-nim-version-support, smaller upgrades instead of big forced updates (updating nim shouldn't force updating non-core modules because this often introduces breakage) and a way for the code to die when it becomes obsolete.

There are no downsides to this approach - the manual recommendation ensures social coordination / curation. As long as the module is relevant, it will be kept up to date and should there be need (ie maintainer steps away), it can be forked into nim-lang/.

arnetheduck commented 8 months ago

Fusion aims to be an immutable code repository, once a module is included it stays. Like Nim itself, Fusion uses a .since annotation. New modules which contain few procs are preferred over larger modules that have more procs.

this is not an advantage limited to fusion, ie packages and lock files achieve the same effect without the many downsides ("bucket of random stuff").

xigoi commented 8 months ago

isn't this a "people aren't using fusion" problem?

Last time I tried to use fusion/matching, it was pretty broken.

metagn commented 8 months ago

I wonder how many problems with fusion matching would remain if you just called it like match foo: instead of case statement macros.

It's also quite large. It can be maintained separately from Nim but still shipped with it or given sufficient promotion. Fusion was supposed to do this but we instead have things like checksums now that are separate packages with hosted docs on the website that even the compiler can use.

The roadmap also explicitly mentions "find a way to have pattern matching in language" as a stretch goal. But this will very likely not be as complex and sophisticated as fusion matching is.

jmgomez commented 8 months ago

@moigagoo updated with some examples

@arnetheduck I agree that's the way to go about libs in general. But I (and probably many) think that pattern matching is a lang feature. It just happens that Nim is so awesome that you can build it in macro space :). Even if not perfect as it is, if it's marked as experimental we can improve it as people use it and change/add on demand and real usage. People shouldnt be surprise with breaking changes (if any) as it's experimental. IMO it's also safer to ship it with the std as probably future changes, need lang updates.

jmgomez commented 8 months ago

I wonder how many problems with fusion matching would remain if you just called it like match foo: instead of case statement macros.

Im not aware of any (serious) issues left. Can you point at those remaining ones?

metagn commented 8 months ago

You can't use case statement macros for builtin types supported for case, like int or string.

One could just offer both variations of the call syntax at the same time anyway.

awr1 commented 8 months ago

Why should i use fusion, it's a bucket of random stuff that i don't even know how supported, maintained and tested is against nim versions i'm using

the discontent here seems to be less so around pattern matching itself and more fusion itself. where you do want fusion to be? removed and back to the std monorepo? its packages split apart unto nimble?

there have been successful "auxiliary standard libraries" in other languages (libboost w/r/t C++ most famously, for better or for worse) but i think fusion as an experiment was unloved against the direction of nim's smaller user base. i think whatever hope that the centralization under the "official nim" banner would have provoked contributors to nurture it in the same way they would std (unlike most libraries, whose fate hinges on usually one developer) but disinterest clearly seeped in very fast otherwise - the last PR was merged almost a year ago.

if you ask me, fusion probably deserves another chance, but probably reorganized. for it to work its importance has to be elevated to closer to that of std but still enabling the intended freedom for edits contra worries about API stability.

as for pattern matching itself, something that is understated here is the number of pattern matching macros out there:

gara
patty
AlgebraicDataTypes
matsuri
and, of course, fusion / matching itself

while we can't stop anyone from making new libraries, clearly there is a need here to reduce confusion. which begs the question as to what exactly the most ideal ground here to work off of.

(side note: i rarely use any of these macros myself (am fine with normal case of) so forgive me if i don't really have a horse in this race, i just feel this conversation is in dire need of clarification.)

arnetheduck commented 8 months ago

Even if not perfect as it is, if it's marked as experimental we can improve it as people use it

This is the point where the theory breaks down: the std lib is a collection of mostly out-of-date unmaintained modules with significant issues ("experimental") and the proposal here is to add yet another one even though there's pretty convincing evidence in the std lib that the follow-up doesn't actually happen - most of the std lib doesn't use new language features as they come out meaning the features themselves are also broken on release and need 2-3 more releases before they're usable because by and large, it's not co-developed with the language. Take sink for example: released as a language feature in ... 1.2 I think, unused in std lib, made usable in 1.6.16 / 2.0 because that's the first time a small subsection of the std lib started using productively.

Combine this with a 1-2 year release cycle and the end effect is that we end up with 2 categories of code: unmaintained std lib code and forked libraries for the things that are still relevant and useful - this is the case of almost every "feature space" so far that isn't "core language" relevant (ie json, async etc) - though even core language features are being forked (looking at you, basic integer support for building bigint libraries).

The std lib does not have the right environment for conducting experiments: it has a slow release cycle and a commitment to non-breaking that is unsuitable for this kind of code. An experimental library on the other hand needs a fast release cycle disconnected from the language cycle.

Any serious use of any library requires that bugfixes can be applied independently of all other things being used: if I'm using json and find a bug, I don't what to have to upgrade async in order to get the bugfix.

Pattern matching is no different - as long as the feature is not actively codeveloped with enabling language features (ie new keywords etc), there's no point having it live in the std lib, but if I want to use it, I want to be sure that I can fix things without waiting on a nim release.

The fate of pattern matching, if we look at the "normal" trajectory of things in std, is thus that in 1-2 years from now, we'll have an unmaintained version that everyone complains about and that gives the language poor vibes (aka "json") and for anyone that actually, seriously, wants to use pattern matching, a side library that does the work well ("jsony", "json-serialisation" etc) but that isn't linked to in documentation because it's not "official".

The social coordination mechanism you're looking for is a documentation entry away - it's that easy to convey social information and it's much easier to change with the changing tides of maintainership (without breaking existing code that can continue to use the old code until the time is appropriate to upgrade).

the discontent here seems to be less so around pattern matching itself and more fusion itself.

fusion suffers from the same basic misalignment as std lib due to constraints put on it: unmaintained and ad-hoc mixed bag of stuff doesn't have a way to stay relevant and evolve - the things nobody uses stay that way and the things that are useful are superseded by forks that do that one thing well.

jmgomez commented 8 months ago

The std lib does not have the right environment for conducting experiments:

I agree, but the "experiment" here is caseStmtMacros not the library itself. By adding a package that uses it to the std the message we are sending is, on one hand Nim ships with pattern matching and the other hand "issues will be fix". Even if there are remaining issues in other parts of the std, I think everyone could agree that the chances for getting a fix in std are higher than in some forgotten package that nobody knows about.

Im not sure if I follow the json point. Isnt a different feature set? Also, it's a good thing that people can choose between different libraries. But I think a closer example of what we are discussing here is the async situation, because it's closer to a "lang primitive". Would it be better to have Nim's async as an external library, even if it were the case where chronos is superior? I personally dont think so.

konsumlamm commented 8 months ago

Also cc @haxscramper, who actually wrote fusion/matching (and https://github.com/haxscramper/hmatching).

haxscramper commented 8 months ago

Fusion matching should not be added as it is and imo it needs to be redesigned syntax wise. There is a new RFC with better design overall and for fusion/matching syntax is a bit too alien after all, compared to nim. :=, ?= and @ operators, some bits are a bit over engineered as well, probably can be dropped.

Otherwise, I think it is a good idea ... hopefully it will be a successful attempt. At least I really wish it would be.

haxscramper commented 8 months ago

there have been successful "auxiliary standard libraries" in other languages (libboost w/r/t C++ most famously

Yes, and they have a process of moving things from this library to the c++ standard.

Also note how most languages do ship some form of standard library ... people are not surprised to see sequences, hash tables, dictionaries, filesystem utilities and so on on the stdlib.

Why then pattern matching becomes such an issue I wonder? We have sugar, we have genAst or whatever for macro construction, peg DSLs, other macro solutions in the stdlib. But since I made a fuckup and had some trust in fusion advertisement several years ago now every discussion about adding feature to the language or standard library turns into packaging, distribution, external libraries or whatever discussion.

And by the way, most languages have pattern matching integrated into the core

jmgomez commented 8 months ago

There is a new RFC with better design overall and for fusion/matching syntax is a bit too alien after all

The issue, which I stated there and nobody cared, is who is gonna implement it? What I see is, matching works Today and beyond the cosmetics issues doesnt require the same amount of resources that the RFC mentioned implies.

There are two parts to this, what caseStmtMacros supports and that like it or not it's tied to the lang on one side and and how the "macro" is implemented on the other. I dont care much about the later, what I care about is the disconnection between the two if they are split.

haxscramper commented 8 months ago

but the "experiment" here is caseStmtMacros not the library itself

I think case statement macros are about as important here as discussion about style sensitivity in identifiers -- it amounts to 0% of the features, 1% of the syntax and 0% of semantic. For fusion matching this certainly wasn't an "experiment", it was just a misguided decision to use this abysmal implicit clobbering of the standard language syntax. For this particular discussion I also don't really see what case statement macros are an experiment here.

Pattern matching is more about experimenting with proper syntax and semantics for matching tables, sequences, tuples, optional and error types, custom predicates, exhaustiveness checking, AST data extraction, nested structure checking, value unpacking, user extensibility, integration with existing language features, performance, readability. I think that is what the "experiment" here is about.

Whether it has case typed on top or match is pretty far down on this list.

jmgomez commented 8 months ago

Well, I mentioned it because the compiler actually needed a fix to use it inside generics, hence the experimental, which I think you also have to enable or at least used to.

I think we are diverging a bit and this wont be productive. Will change the title to remove fusion/matching from it and keep it more general for PM. If everyone agree, that it should live in the std in the future or an "easy to access" solution we can continue this conversation in another RFC to layout the design (or in another repo). I dont have strong opinions about it, so if someone has something in mind, please open it.

If the general feeling is that it should not live in the std or ships with Nim. Im not sure what the next steps are.

ZoomRmc commented 8 months ago

Well, if you've changed the name and do not insist on a specific implementation, what's the main difference now compared to #525?

jmgomez commented 8 months ago

Well, if you've changed the name and do not insist on a specific implementation, what's the main difference now compared to #525?

No sum types
Doesnt aims to implement it at the lang level.
Promotes reuse of what already exists so it takes little resources. That one was created in May nobody has worked on it, just talk.

And more importantly, see if we all can agree on a path forward of how it can be ship with the compiler.

haxscramper commented 8 months ago

Just to clarify "Promotes reuse of what already exists" refers to taking fusion/matching or any other library that is out there (such as gara or patty) and working on it to make it possible to add it to the standard library. Is that correct?

jmgomez commented 8 months ago

Yes, or hmatching or starting one from scratch if anyone feels like it.

When I open a RFC, I do it because I know I have the knowledge and the time to implement it. I can help to tweak an existing one if approved, but I cant compromise to do one from scratch

arnetheduck commented 8 months ago

Would it be better to have Nim's async as an external library

yes, in every single aspect I can think of - if I believed async was better off in std, we wouldn't have chronos ;) but also: if async had been a library, we wouldn't have chronos either because it would have been easier for us to just contribute to to a single library - its inclusion in std lib prevents that from being a viable option for above mentioned reasons (and a few others).

json

I used json is an example of a module that in std lib is a) mostly unusable beyond toy examples b) fundamentally flawed (slow, mem-hungry, occasionally buggy, not-fully-standard-compliant etc) c) forked because of its flaws but unfixable without compromises / breakage - it's one of those early experiments that is difficult to remove because removing is worse than having it because it would needlessly break a lot of stuff. Lose-lose.

because I know I have the knowledge and the time to implement it

this is fantastic, but it doesn't have to live in the std lib for that ;) it's even easier to improve things without waiting for nim and without nim waiting for those improvements. No coordination needed really.

macros

...allow us to write library like chronos or pattern matching without having to resort to language changes (or at least it lets us get pretty far except for the occasional core primitive) - it's a killer feature that enables developing powerful libraries without encumbering the main distribution.

Im not sure what the next steps are.

write the better library and trust users to be smart enough to recognize it - they are, truly.

jmgomez commented 8 months ago

I think you misread what I said, you missed the RFC bit :P

When I open a RFC, I do it because I know I have the knowledge and the time to implement it.

Meaning, I wouldnt open a RFC to implement the pattern matching everyone wants to see in Nim

haxscramper commented 8 months ago

For the sake of seeing this discussion moving forward, let's assume that some library was picked to be the baseline for implementing the std module -- what would be the next proposed steps for seeing this to completion? E.g., picking the set of features to be added, some cosmetic edits to the syntax, some things that need to be fixed or didn't quite work out (get the feedback on the implementation and UX, it is a good chance to iterate).

Also, if we imagine that pattern matching is actually added, what other things must be fixed for it to be fully usable (some bugs in the language that would need to be addressed, things with generics, etc.).

These RFCs usually get no follow-up or roadmap, so I'm trying to address this common shortcoming.

Araq commented 8 months ago

We need to special case as in the language so that we know without having to type check that f(x) as y introduces a new identifier y.

Araq commented 8 months ago

We need a nim-lang/patterns package which the documentation refers to which makes case available for objects, tuples, seqs and arrays and their nestings.

syntax is:

 of MyObject(fieldX as x, fieldY as y)

 of MyObject(as x, as y) # fields are taken from the order within MyObject

likewise for tuples

for arrays: of [as x, as y, _] # extract first two elements for seqs: of @[as x, as y, _]

To access the last element of an array/seq use of [_, as x]

well there are also literals, of MyObject("abc", as y)

arnetheduck commented 8 months ago

fields are taken from the order within MyObject

this is breakage-generating because it introduces a backwards-compatibility requirement that fields don't change order or are added in-between in objects which is unusual.. same as here

Araq commented 8 months ago

I'm aware and I don't particularly care, so make by-order only allowed if it's in the same module where the object is declared in.

konsumlamm commented 8 months ago

We need a nim-lang/patterns package which the documentation refers to which makes case available for objects, tuples, seqs and arrays and their nestings.

syntax is:
 of MyObject(fieldX as x, fieldY as y)

 of MyObject(as x, as y) # fields are taken from the order within MyObject
likewise for tuples

for arrays: of [as x, as y, _] # extract first two elements for seqs: of @[as x, as y, _]

To access the last element of an array/seq use of [_, as x]

well there are also literals, of MyObject("abc", as y)

What's the reason for needing as everywhere? It looks like useless noise.

xigoi commented 8 months ago

What's the reason for needing as everywhere? It looks like useless noise.

So that you can distinguish binding to a variable from matching against a constant.

konsumlamm commented 8 months ago

What's the reason for needing as everywhere? It looks like useless noise.

So that you can distinguish binding to a variable from matching against a constant.

Hmm, I think I'd like let better for that:

 of MyObject(fieldX: let x, fieldY: let y)

 of MyObject(let x, let y) # fields are taken from the order within MyObject

It defines a new variable, just like let. as makes me think of conversions.

omentic commented 8 months ago

I support this (obviously). My only concerns are that I don't think any candidate out there (fusion/matching, patty, gara) are fit for inclusion in the stdlib, and I am worried about accidentally making the implementation incompatible or hard to fit to a future implementation of sum types. I also don't think pattern matching is terribly useful on its own without sum types (but that's neither here nor there).

With regard to #525: I haven't had much time to work on it thus far and I don't know how much time I'll have in the future. It's certainly on my docket but also certainly a lower priority than school and work and the like.

FWIW, I went through the #525 RFC a little bit and updated the examples given to use the x as y syntax to not hit the mentioned backwards compatibility issue. I think that and the syntax sugar for structs & named tuples (https://github.com/nim-lang/RFCs/issues/525#issuecomment-1603173759) is the best approach, consistent with import x as y. (but that's mostly bikeshedding. also irrelevantly, i think any pattern matching should land with if myObject of MyObject(x, y): x+y in some form).

xigoi commented 8 months ago

Hmm, I think I'd like let better for that:

I like this because it would also allow of MyObject(fieldX: var x) for creating a mutable binding.

nim-lang / RFCs