haskell / core-libraries-committee

95 stars 15 forks source link

Add Foldable / Traversable instances for tuples #206

Open Bodigrim opened 11 months ago

Bodigrim commented 11 months ago

(This was originially submitted by @Topsii as a part of #93 / !9489, but I think it's worth its own discussion)

At the moment we have instance Functor (x,y,), instance Bifunctor (x,,), instance Bifoldable (x,,) and instance Bitraversable (x,,), but neither instance Foldable (x,y,) nor instance Traversable (x,y,). This is extremely inconsistent. Thus the proposal is to add them, and similar for all tuples up to 7 elements.

(7 is not an arbitrary number, Haskell Report suggests that standard functions should be implemented for tuples up to 7 elements)

tomjaguarpaw commented 11 months ago

Although I appreciate the push for greater consistency I am very skeptical about this. The proposed Foldable instances mean that we enlarge the set of circumstances where people get confused about length _.

mixphix commented 11 months ago

I wonder what the potential breakage from removing tuple instances of Functor/Foldable/Traversable would be. Given that it is widely agreed that length (x, y) == 1 is meaningless, perhaps we ought to remove the footgun instead of adding more?

phadej commented 11 months ago

There is nothing wrong with Functor ((,) a) nor Traversable ((,) a. These are not footguns (except if latter is used to define Foldable-like combinators).

Similarly one can argue that Foldable (Map k) and Foldable Set are footguns as elem @Set @a and elem @(Map k) @v are O(n), not O(log n). However removing Traversable (Map k) will break a lot of code.


Removing Traversable ((,) a breaks GHC itself already.

--- a/libraries/base/Data/Traversable.hs
+++ b/libraries/base/Data/Traversable.hs
@@ -314,8 +314,8 @@ instance Traversable (Either a) where
 deriving instance Traversable Solo

 -- | @since 4.7.0.0
-instance Traversable ((,) a) where
-    traverse f (x, y) = (,) x <$> f y
+-- instance Traversable ((,) a) where
+--     traverse f (x, y) = (,) x <$> f y

 -- | @since 2.01
 instance Ix i => Traversable (Array i) where

results in failure to compile GHC:

compiler/GHC/Utils/Monad.hs:196:22: error: [GHC-39999]
    • Could not deduce ‘Traversable ((,) a)’
        arising from a use of ‘traverse’
      from the context: (Applicative m, Traversable f)
        bound by the type signature for:
                   mapSndM :: forall (m :: * -> *) (f :: * -> *) b c a.
                              (Applicative m, Traversable f) =>
                              (b -> m c) -> f (a, b) -> m (f (a, c))
        at compiler/GHC/Utils/Monad.hs:195:1-81

I actually was expecting the transformers to fail to compile already, but its writer and state are defined in a swapped way:

newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }

instance (Traversable f) => Traversable (WriterT w f) where
    traverse f = fmap WriterT . traverse f' . runWriterT where
       f' (a, b) = fmap (\ c -> (c, b)) (f a) -- strict first @(,) f

so they inline first (and don't use traverse).

mpickering commented 10 months ago

Foldable ((,) a) was already a mistake, I am strongly opposed to adding any more instances for the other tuple sizes.

Topsii commented 10 months ago

(This was originially submitted by @Topsii as a part of #93 / !9489, but I think it's worth its own discussion)

I have moved the tuple instance definitions to a separate commit for the case that this is approved. The merge request is !11538.

s-and-witch commented 5 months ago

Hello there.

I'm working on codebase with high usage of high order functions that do something with monad M. These helpers change some environment for inner computation, catch errors and perform other complicated actions. A lot of them use tuples to return some additional stuff, so they may have such types:

M a -> M a
M a -> M (A, a)
M a -> M (A, B, a)

I didn't see more than 3 return values, however I can't say they can't be found in the future.

I had to write a wrapper monad under M, let's say SM, that can be defined as StateT S M. Furthermore, I need high-order helpers to work with my monad, i.e. I need to lift all these helpers into my monad:

(forall a. M a -> M a) -> SM a -> SM a
(forall a. M a -> M (A, a)) -> SM a -> SM (A, a)
(forall a. M a -> M (A, B, a)) -> SM a -> SM (A, B, a)

I need to use RankNTypes here because I'm passing M (a, S) inside.

This helper should treat state carefully and perform some steps to fill M's environment to work properly, so I wish to define it only once. After a lot of work I come with generalization of all the functions that I may need:

Traversal t => (forall a. M a -> M (t a)) -> SM a -> SM (t a)

edit: Traversable t here

I started to use my helper function in the codebase, and, after hard work, I updated all the subsystem of my interest to my new monad. Updated and found that

Could not deduce ‘Traversable ((,,) A B)’

Thanks to everyone who participated in the discussion 🙏.

tomjaguarpaw commented 5 months ago

I guess you mean

Traversable t => (forall a. M a -> M (t a)) -> SM a -> SM (t a)

I'm not sure if you're saying you're stuck, but if so I see two possible ways to proceed:

  1. Define newtype TupleWrapper a b c = TupleWrapper (a, b, c). Then you can give it a Traversable instance.
  2. Instead of taking a Traversable as a constraint, take the traverse operation as an argument:

    (forall a b f. Applicative f => (a -> f b) -> t a -> f (t b)) -> (forall a. M a -> M (t a)) -> SM a -> SM (t a)
s-and-witch commented 5 months ago

I understand that there are low-tech workarounds, that doesn't allow me to use one function in all cases, however I want to give you a motivational example about what may happen in the wild.

I've never had no desire to use length or minimum on a tuple and don't feel a problem here, but I want to be able to write a polymorphic function that can work with widely used types.

mixphix commented 5 months ago

low-tech workarounds

Okay.

Once again, I think that if users need Foldable or Traversable tuples, that they're Doing It Wrong, but that custom types and writing their own instances gives them the power to do so if they choose. I am against adding these instances and would much prefer we remove the existing ones for pairs instead.

Bodigrim commented 3 months ago

The proposed Foldable instances mean that we enlarge the set of circumstances where people get confused about length _.

@tomjaguarpaw could you possibly unpack this argument? What's exactly confusing?

As we discussed at https://discourse.haskell.org/t/in-retrospect-was-burning-bridges-a-mistake/8188/46, it's not like Foldable in general is confusing for users. And certainly length of tuple is no worse than length of Identity, Maybe and Either.

tomjaguarpaw commented 3 months ago

I mean that length (1, 2) == 1 is confusing. This proposal suggests allowing length (1, 2, 3) (which would also equal 1), which is also confusing. I don't think that's a good idea. I would say that we should never have had all of these conditions pertain at the same time

But sadly all three do pertain. Let's not make things worse by allowing length (1, 2, 3) == 1 to hold from the Prelude.

I disagree that length (Identity 1) == 1, length Nothing == 0 or length (Just 1) == 1 are confusing. I do find length (Right 1) == 1 combined with length (Left 2) == 0 confusing.

(I agree that Foldable length is in principle a meaningful operation, but it shouldn't be the length we get from Prelude. Oh well. Too late now.)

Bodigrim commented 3 months ago

I disagree that length (Identity 1) == 1, length Nothing == 0 or length (Just 1) == 1 are confusing.

What about length (Identity [1,2,3]) == 1 and same for Just? Is it not confusing?

tomjaguarpaw commented 3 months ago

There's a difference of degree. length (1, 2, 3) == 1 is much more confusing than length (Identity [1, 2, 3]) == 1 (or length [[1, 2, 3]] == 1, for that matter).

lierdakil commented 2 months ago

I don't see what's confusing about length for tuples retuning 1. It may be surprising for someone coming, say, from Python, but it's a question of adjusting the mental model to properly account for Foldable semantics. And using length without understanding Foldable is a footgun in itself to begin with (arguably, the length function is itself a footgun, you don't know its complexity unless you know for a fact how a type in question implements it, and it's generally a bad idea to use it polymorphically for that exact reason).

I see nothing wrong with the proposal.

hasufell commented 2 months ago

My impression is that if we reject this proposal on the grounds of "Traversable for Tuples is confusing", then we are obliged to do something about it, as in: deprecate and remove the instance.

If we don't agree on removal, we should accept this proposal.

"Let's do nothing" seems like the worst of all options.

lierdakil commented 2 months ago

If you start removing things on the grounds that they "are confusing" (which is an extremely subjective metric!) despite being well-defined and internally consistent, I think that would be an indicator that something went horribly wrong somewhere. Case in point, monads are generally quite confusing to a lot of people. Hopefully you see my point.

tomjaguarpaw commented 2 months ago

"Monads" and "length for tuples returning 1" are different kinds of "confusing". The former sort of confusion imposes a barrier before the feature can be used. The latter sort of confusion leads to very subtle run time bugs.

And to be clear: there's nothing wrong with Foldable having a method of type f a -> Int that returns the number of elements. What causes problems is that it is both called "length" and exposed from Prelude.

lierdakil commented 2 months ago

imposes a barrier before the feature can be used

Not true at all. I've taught Haskell for a few years to undergrads, and the amount of code I've seen that misuses and abuses monads due to that exact confusion would surprise you. At least, it surprised me. That included a decent few cases of "subtle runtime bugs" as well. So I don't see a substantial difference here.

What causes problems is that it is both called "length" and exposed from Prelude.

You could call it size or count, but that's bikeshedding and wouldn't substantially change anything. The name is descriptive of what it does: it returns the length/size/number of elements of a foldable functor.

Now, one needs to understand what a foldable functor is to understand what it does, which is not particularly helpful to complete beginners -- that much is obvious. However, if you're concerned about that, the only option that would actually help with that IMO is to export a monomorphised version from Prelude: length :: [a] -> Int. This seems like a step backwards, but it'll alleviate the confusion you're concerned about. In any case, anything of the sort should be a separate proposal and has no bearing on the proposal being discussed.

hasufell commented 2 months ago

I'm personally not buying into the confusion argument. I use fmap on 2-Tuples a lot and yes, there have been experienced professional Haskellers that got thrown off by it.

But once you look at it, there's no other way to define it.

What would be confusing is if there were a lot of other definitions of length that made sense.

So the confusion is more about "I didn't think about this before". That is an indication that we need better documentation.

If we had better mechanisms to selectively import instances, then I won't mind making it harder to import these more exotic instances.

tomjaguarpaw commented 2 months ago

imposes a barrier before the feature can be used

Not true at all. I've taught Haskell for a few years to undergrads, and the amount of code I've seen that misuses and abuses monads due to that exact confusion would surprise you.

Without examples it's hard to perform a "differential diagnosis" and determine whether "monads" really are confusing in the "same way" as "length in Foldable exposed from Prelude". If you have some evidence to share that you think weighs on this discussion then please do share it!

However, I do know from long experience that Prelude.length working on tuples, and giving the result 1, has caused confusion, surprise and bugs for many Haskellers including myself. For example

Today is the first day that the Foldable instance for tuple bite me. Foldable allows me to call length, which always returns 1. Even worse, none of my tests caught that. I guess I got lucky there this time

https://x.com/l7r7_/status/1534992482966159388

The problem isn’t that tuple has a Foldable instance, it’s that the length exported by Prelude is the one from Foldable instead of the one from Data.List #FoldableOutOfPrelude

https://x.com/ChristiaanBaaij/status/1301769050872315904

In Haskell, I would rank some tuple typeclass instances, especially Foldable, as being a wart. Completely consistent / logical / emergent, but also a bad surprise for most people upon first encounter.

https://x.com/g_lebec/status/1607881099308568587

I just fixed a bug where Foldable.length tup was applied to a tuple, when the author meant the list in snd tup. It typechecked, of course. I assume it was only a list at first, the tuple was added later and the code continued to typecheck.

https://x.com/Profpatsch/status/1450470441810812934

The fact that people are confused about the result of taking the length of a tuple in Haskell confirms to me that this is a bad idea in general.

https://x.com/Profpatsch/status/1450470441810812934

I had length applied to a tuple, and had to debug the code for a while 😐

https://x.com/dfordivam/status/951609307237249024

when I try this length function with various Tuples, what I see confused me

https://stackoverflow.com/questions/36460833/why-does-length-return-1-for-a-tuple-with-2-elements-and-gives-an-error-for-a-t

What would be confusing is if there were a lot of other definitions of length that made sense.

But there are other definitions that make sense! None of them have type Foldable f => f a -> Int, but that's not the point. length (1, 2, 3) == 3 "makes sense". length ("重叠字" :: Text) == 3 "makes sense". Neither of them fit into the Foldable type class, but they "make sense". Why is Foldable the privileged thing that has "length" in Haskell? I don't see a good reason.

If Foldable.length were called Foldable.numberOfElementsOfTheTypeOfTheLastParameter then I would not have a problem. That name communicates well and can't be confused for doing something it doesn't. If Foldable.length were not exposed from Prelude then I would not have a problem. People are allowed to use whatever names they like for anything, as long as they don't impose them on others!

The fact that there is only one value of type Foldable a => a -> Int does not absolve us from writing APIs that communicate well, and does not grant us the right to call that thing unqualified length. Now, that ship has sailed. The best we can do now is to not make anything worse.

s-and-witch commented 2 months ago

I actually don't care about Foldable instance for any tuple, because the main goal of Foldable is to give me a for_ over this structure, not much useful for a tuple.

But I really want to have Traversable for more tuple types. It's possible to use Bitraversable for that purpose, but it doesn't play well with other traversable things like Identity.

hasufell commented 2 months ago

But there are other definitions that make sense! None of them have type Foldable f => f a -> Int, but that's not the point.

I'm not sure what to make of this. It seems you're not really arguing against this proposal, but to rename a member of the Foldable class (which would be an insanely breaking change).

In the context of the Foldable class, there are no meaningful alternative definitions of length for (,) a.

"All instances follow naive intuition" is a pretty hard property to satisfy. We'll be redesigning lots of classes if we go for this.

IMO, this is what you get with ad-hoc polymorphism.

I believe this is a use case for linters. They may very well emit a fat warning for these exotic instances.

tomjaguarpaw commented 2 months ago

But I really want to have Traversable for more tuple types

I agree this is useful, but actually I find Traversals (from lens/optics/your favourite optics library) to be even more useful, because they're not artificially limited to polymorphic containers, and to the values whose type matches the final type variable.

It seems you're not really arguing against this proposal ...

What I am arguing is that we should never have had all of these conditions pertain at the same time

Sadly they do pertain, and indeed changing any one of them would be an "insanely breaking change", so I'm not suggesting changing them. I am arguing that we shouldn't make things worse by adding

"All instances follow naive intuition" is a pretty hard property to satisfy

I don't think I'm arguing for that. People are welcome to define whatever classes and instances they like in their own packages. I'm even flexible regarding what classes and instances are defined in base. However, I'm arguing to not export from Prelude identifiers that trivially break naive intuition. We can't, however, rectify this particular case (without insane breakage) so in the context of this proposal I'm suggesting: don't make things even worse.

IMO, this is what you get with ad-hoc polymorphism.

Yes, sadly. I believe we should be extremely judicious about what ad hoc polymorphism is exposed from the Prelude. Not all historical choices were judicious, but they're hard to change. It's much easier to prevent bad future bad choices.

I believe this is a use case for linters. They may very well emit a fat warning for these exotic instances.

I think that's a reasonable counterpoint, and indeed stan does warn when length is used on a tuple.

hasufell commented 2 months ago

However, I'm arguing to not export from Prelude identifiers that trivially break naive intuition.

I guess you're free to follow that principle and there's not much point in discussing that.

But I want to point out that under that definition of "confusing", I find Monad instance for [] confusing. Especially with do notation :)

tomjaguarpaw commented 2 months ago

OK, fine. Maybe being "confusing" is not sufficient justification. How about "prone to subtle, hard-to-diagnose bugs"?

Kleidukos commented 2 months ago

Quite interestingly, Elixir makes a difference according to the data structure:

When counting the elements in a data structure, Elixir also abides by a simple rule: the function is named size if the operation is in constant time (the value is pre-calculated) or length if the operation is linear (calculating the length gets slower as the input grows). As a mnemonic, both "length" and "linear" start with "l".

For example, we have used 4 counting functions so far: byte_size/1 (for the number of bytes in a string), tuple_size/1 (for tuple size), length/1 (for list length) and String.length/1 (for the number of graphemes in a string). We use byte_size to get the number of bytes in a string, which is a cheap operation. Retrieving the number of Unicode graphemes, on the other hand, uses String.length/1, and may be expensive as it relies on a traversal of the entire string.

https://hexdocs.pm/elixir/lists-and-tuples.html#size-or-length

If we are unwilling to give length an actually useful value for tuples (that does not lead to hard-to-diagnose runtime issues), then we can perhaps get a function that does the actual job and establish a distinction.

hasufell commented 2 months ago

OK, fine. Maybe being "confusing" is not sufficient justification. How about "prone to subtle, hard-to-diagnose bugs"?

Yeah. I think there are all sorts of things where at least a certain class of users want some kind of warnings for purposes of "questionable ergonomics".

I count partial functions and exotic instances both in that category, as well as readFile.

GHC warnings seem a bit radical (that's why I was opposed to head/tail warnings). Banishing them from base seems harsh, especially when they can serve advanced use cases. And then the only thing left is indeed linters or making it harder to import an identifier.

lierdakil commented 2 months ago

evidence to share

Not really evidence per se, but from memory, Monad instances for Either a, (,) a caused a lot of anguish to a some students. Mostly this came down to the naive expectation that monadic bind with Either a a and (a, a) should "visit" both branches/elements. Maybe that's partially on me, as I've used a binary tree as a running example in my lectures. Still, that's pretty similar to expecting Foldable.length to visit all elements of a tuple.

I just fixed a bug where Foldable.length tup was applied to a tuple, when the author meant the list in snd tup. It typechecked, of course. I assume it was only a list at first, the tuple was added later and the code continued to typecheck.

This isn't really evidence in favour of your position. It still would've been a bug if we had length (_, _) == 2, which is the naive expectation -- which I assume by your metric wouldn't be "confusing". Or would it?

we shouldn't make things worse by adding instance Foldable ((,) a b)

I don't see how adding instance Foldable ((,) a b) (etc up to 7 elements) "makes things worse", at least not substantially. There's already an instance for pairs, so as it is, things are already confusing, both in the sense you're arguing for, and in the sense it's inconsistent. If you're not up for removing Foldable and Traversable instances for pairs, then you're basically arguing for "let's leave things as is in the inane state they are and never touch them". That doesn't seem like a productive position.

I'll reiterate, adding Foldable and Traversable items to larger tuples is entirely orthogonal to your "confusion" argument. It certainly doesn't make it better, but it doesn't make it significantly worse either -- as in, if you're already confused by length for pairs, you won't become substantially more confused due to the existence of those instances. Adding those instances does improve the overall consistency, however.

So, aside from the unproductive "let's do nothing and never speak of this again", I see only two meaningful options in the context of this proposal:

I'm personally in favour of the latter, but I could live with the former.

tomjaguarpaw commented 2 months ago

I rescind my former claim that my objection to this proposal rests on it being "confusing". Thanks to dialogue with @lierdakil and @hasufell I realise that word suggests that the case is far weaker than it actually is. Instead I'd like to restate the same case but with "prone to subtle, hard-to-diagnose bugs" in place of "confusing". I apologise for the confusion. I'd be grateful if future replies address that objection, rather than the objection of being "confusing".

Mostly this came down to the naive expectation that monadic bind with Either a a and (a, a) should "visit" both branches/elements.

Thanks, I think these are indeed examples that are to some degree analogous to the instance under discussion here. My personal point of view is that instance Monoid a => Monad ((,) a) is not a particularly good one. (Writer communicates much better for those that want that). If proposed today I would vote against it. We do not have

instance (Monoid a, Monoid b, Monoid c, Monoid d) => Monad ((,,,,) a b c d)

nor do I think we should add it (this family of instances is only "consistent" until 4-tuples)! I am also doubtful about instance Monad (Either e) but it's perhaps far too practical in practice to avoid.

I just fixed a bug where Foldable.length tup was applied to a tuple, when the author meant the list in snd tup. It typechecked, of course. I assume it was only a list at first, the tuple was added later and the code continued to typecheck.

This isn't really evidence in favour of your position. It still would've been a bug if we had length (, ) == 2, which is the naive expectation -- which I assume by your metric wouldn't be "confusing". Or would it?

Let's look at this through the lens of my upgraded objection: by my metric would this be a source of subtle, hard-to-diagnose bugs? Yes, I think it would. In fact I think having something called length work on tuples of known length is more trouble than it's worth. The "length" of such values is known statically. Why muddy the water by having a method that purports to produce some useful information about them?

adding Foldable and Traversable items to larger tuples is entirely orthogonal to your "confusion" argument. It certainly doesn't make it better, but it doesn't make it significantly worse either -- . Adding those instances does improve the overall consistency, however.

Let's look at this through the lens of my upgraded objection: does adding more potential sources of subtle, hard-to-diagnose bugs make things significantly worse? It seems to me that the answer is obviously yes, unless there is some associated benefit that outweighs the increase in bugs. I have not yet seen that benefit.

if you're already confused by length for pairs, you won't become substantially more confused due to the existence of those instances

I agree. This is one of the observations that made it clear to me that I should upgrade my objection from "confusing", which was a poor choice of word on my part, to "source of subtle, hard-to-diagnose bugs", so thank you for that.

aside from the unproductive "let's do nothing and never speak of this again", I see only two meaningful options in the context of this proposal:

I don't understand why that is "unproductive". It seems to me perfectly productive to keep out a source of subtle, hard-to-diagnose bugs from the standard library. Could you elaborate?

lierdakil commented 2 months ago

I don't understand why that is "unproductive".

Taking this position results in not producing anything :shrug:

There are issues. The "productive" position is to do something about them. Keeping the status quo is the inverse of that.

associated benefit

I would propose that the case for accidental length (_, _, _) and larger-sized tuples is not as common as for similarly accidental length (_, _) by a considerable margin. I don't have evidence to back this up, though. As for intentional, that comes around back to the "confusion" thing -- which I don't personally buy, but the solution is changing the length exported from Prelude, not keeping the status quo. Which is orthogonal to the proposal at hand.

There isn't an associated benefit to Foldable itself, however, Foldable is a prerequisite for Traversable. And traverseing (or sequenceing) over tuples is a thing that comes in handy quite often in the code bases I worked on. @s-and-witch gives a more or less real-world example above, but that's not the only case where it's useful. Granted, the pair case is by far the most common here, too.