Open milesfrain opened 3 years ago
What's a good strategy to keep everything up to date?
I think adding a PR template that has a checkbox saying, "If you are adding a new API in library X, you must also open issues to add the same API in libraries Y and Z.
I still don't have a good understanding of how using more typeclasses leads to trouble.
Here's my (potentially wrong) understanding. Type class constraints translate to type class dictionaries, which are just extra values passed into a given function. As a result, performance tends to suffer a bit because an extra closure is allocated.
Also, what determines whether a typeclass instance is defined in the type or the class?
I don't think there is a policy here. If a type class and a type appear in the same package, so that we could put the instance in either, I'm not sure which is preferable. The first issue is avoiding cyclical dependencies between modules. The second part of the issue is that we don't have an "instance search" that will find Type A's instance for type class B regardless of where it is defined. If I'm trying to find all instance of a type class that I'm not familiar with, I would want instances in the type class' module. However, if I'm looking to see what instances a type A has, I would want instances in the type's module.
The location of instances is not arbitrary, in fact we generally have no choice in where they go (unless we want to rearrange the dependency hierarchy). In particular Array instances always have to live in the same module as the class because the Array type is defined inside the compiler.
Here's my (potentially wrong) understanding. Type class constraints translate to type class dictionaries, which are just extra values passed into a given function. As a result, performance tends to suffer a bit because an extra closure is allocated.
This is correct (at least until we start optimising this code a bit more) but it’s not what I would consider the main reason. For me, the main reason is that type classes make your types more complicated, cause type inference to suffer, and are just generally more difficult to work with than regular functions. Because of this, I have argued that they should be avoided except in cases where they’re likely to provide the most benefit, in order to outweigh those issues. I don’t really know how I can make my case any clearer than I have in that sequence of blog posts.
For me, changing the public API of all of these libraries for the purpose of making maintenance easier would be wrong. The API should always be led by what we think constitutes the best designed interface from the perspective of someone using the library; it’s very rare that any other concern should have a major influence. So the question that needs to be answered is “would this improve the API?” This has come up once or twice before and you can make a decent argument arguing that it would, but I’m still hesitant for the reasons given above.
I don’t think we can use a type class to cut down on the amount of documentation work that needs doing, because these functions will have slight differences depending on the type they act on - most importantly in terms of performance, but also for behaviour in some cases. For example it doesn’t make sense to talk about whether duplicates are preserved in Set.union, but it’s quite important to with Array and List union.
What worries me most about a SetLike class is whether types other than Set would satisfy the expected algebraic identities. For example, set union and intersection are both associative and symmetric, and intersection distributes over union. I suspect most of these are not true for the Array/List variants. This can cause problems if you write code that relies on these properties, not realising you’ve done so because you’ve only used it with Set.
I don’t really know how I can make my case any clearer than I have in that sequence of blog posts.
:+1:
I believe Scala had a Length
type class at one point that they later removed. I don't know why, but I think it was removed for similar reasons.
The location of instances is not arbitrary, in fact we generally have no choice in where they go (unless we want to rearrange the dependency hierarchy).
As an example of this, see https://github.com/purescript/purescript-profunctor/issues/23.
Maybe it would be cool to have an interface definition language (for a separate tool, not as an extension to the PureScript language proper) for defining module interfaces—names, suitably parametrized types, maybe some quickcheck properties that the members should satisfy—and then be able to test that a module's exports satisfy one or more of those interfaces as part of a library's test suite. The tool could read the IDL and generate PureScript code for a specific target module; it wouldn't even need to consume anything like CoreFn, I don't think. I don't know exactly how a project would consume those IDL files, or what the process would be for publishing them, but maybe there's something to that idea?
I’d be very much in favor of adding support for such interface definition language to PureScript itself!
I’ve thought about that this summer and I’d like to allow modules to import abstract modules (written in the interface definition language) whose implementation can be provided by other packages. I think this would be very useful for configuration, to provide specific implementations for browsers, Node.js or alternative backends and, as you suggested, to guarantee that multiple modules satisfy the same interface.
If that involves changing the language I’m unlikely to be on board, to me that feels like a lot of implementation complexity and language complexity for relatively little gain.
Found this really nice table for F# https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/fsharp-collection-types#table-of-functions I think we should strive to build the same for PureScript.
Noticed that Cons
is missing from that table though. So an opportunity for us to make our version even better!
There's a lot in common among containers such as
List
,Array
,Set
, but their APIs drift out of sync. What's a good strategy to keep everything up to date?We could continue to apply techniques such as #183 across these libraries, but that requires constant manual intervention.
Would it be appropriate to use more type classes to enforce common interfaces? Or should we avoid those, as suggested in https://harry.garrood.me/blog/down-with-show-part-1/ ? I still don't have a good understanding of how using more typeclasses leads to trouble.
For example we could have a
SetLike
type class that providesintersection
,union
,difference
, etc. thatList
,Array
,Set
would all have an instance of. A similar strategy could have also helped us avoid https://github.com/purescript/purescript-lists/issues/178 and needing to remember to address https://github.com/purescript/purescript-arrays/issues/192. Interfaces could also cut down on the burden of maintaining duplicated documentation for all of these duplicated functions.Another minor bit of messiness is that we currently have both
intersect
andintersection
, depending on the container. Of course this can be fixed without resorting to typeclasses, but could also have been prevented by embracing more typeclasses.Also, what determines whether a typeclass instance is defined in the type or the class? It seems somewhat arbitrary. For example:
Semigroup
defines the instances forArray
andVoid
Void
defines the instance forShow
Show
defines the instance forArray
Things would be a bit more organized if the policy in core was to always define instances in the type, which would look like the following:
Void
defines the instance forShow
andSemigroup
Array
defines instances forShow
andSemigroup