Open kindaro opened 8 months ago
Specifically to the choice between vectorToMaybe
and headMaybe
— even though the former has a precedent in Data.Maybe.listToMaybe
, the latter is more precise. There are any number of ways to make a Maybe
out of a Vector
that are all parametrically polymorphous. For example, returning the 3rd element only if the first 5 elements are defined is a natural transformation from Vector
to Maybe
.
maybeHead
is better from the point of view of the English language, by the way — perhaps it should be preferred. It can be extended to a principle maybe …
, so that we get maybeMaximumBy
, maybeIndexM
and so on. However, perhaps we can do better and find more descriptive short names instead of making every name longer?
There is maybeHead
mentioned here: https://github.com/haskell/core-libraries-committee/issues/87#issuecomment-1250537133. Not in actual code but in freehand kind of speech.
There is already readMaybe
. So, I'd vote for consistency and use Maybe
as a suffix, rather than the prefix: indexMaybe
Despite that with the prefix it reads better in English.
Same could be applied to maximumByMaybe
.
Also it is easier to read and match with the partial counter part in the doc:
maximumByMaybe
maximumBy
last
lastMaybe
head
headMaybe
index
(would be good to add as well for consistency)indexMaybe
Similar to how we have M
suffix, eg. M
is a suffix in mapM
, despite that monadic map reads more naturally in English.
I agree that consistency is a possible way to refine the ordering of candidate names. But thumbs (either up or down) only distract — arguments feed the soul. And ideas on sections marked with the code «TODO» would be even better.
I @kindaro am accountable for this overview and I shall keep this message up to date. Give me your arguments and I shall write them in. Then we shall see.
Consistency with base
matters more than consistency with the English language.
Predictable names are easy to recall, and truly consistent names are predictable.
base
.listToMaybe
and readMaybe
are also known under different names and that in practice there is no consistency to begin with. The evidence is in Hoogle. This is about as painful as remembering whether your code base has pack ∘ show
called tshow
or tShow
— another spectacular blunder in Haskell's library design.base
would be a better consistency than consistency with the English language.Easier to read.
No arguments so far.
headMaybe
, maybeHead
…maybe_head
or head_maybe
would have been easier to read, but we have our own æsthetics in Haskell. Also, proportional fonts are easier to read than monospaced. So, Haskell programmers seem to be terrible judges of ease of reading.Easier to match with the partial counterpart in documentation.
maybe
from names that do not start with maybe
, than to tell names that end with Maybe
from names that do not end with Maybe
. If you want to find either a function or a partial function given its counterpart, then it is better if names end with Maybe
, but if you want to avoid exceptions in pure code overall, then it is better if names start with maybe
. Both are our design goals and we need to find ways to solve them that do not go against one another.We are adding not one, not two, but 35 functions. We can blaze a trail of our own. Being consistent with one function in base
is good, but being consistent with 34 other functions in the same module is good 34 times.
base
that return Maybe
and whose names end on Maybe
.This looks like Hungarian notation. But we are not doing Hungarian notation. base
has print
, not showIO
. Our types are complicated enough that encoding them in names would be futile. We want names that are easy for people to remember and to spell. If they happen to be instances of the same principle, this is good. If not, this is also alright.
The words «maximum by maybe» mean that you take a «maybe» and use it to get a maximum, like in «death by firing squad». This is not merely awkward like «Read, maybe?» — this is outright wrong. The English language does not work like this.
Another possible principle of naming is «… or nothing», like maximumByOrNothing
for example. Since «by» cannot be followed by «or», it is clear that something should be put in between.
Generating documentation from property checks seems to be technically infeasible. Right now the code we have is packaged as it is — there are no instruments to release generated code, much less to carefully patch automatically generated haddocks into handwritten code.
However, we can try to generate checks from documentation.
There are already instruments for this — the one I know is the doctest
project made by Simon Hengel. It supports property checks — I suspect the syntax for property checks was added to haddock
specifically for doctest
. However, doctest
will never be aware of the correspondence between completed functions and their partial counterparts — this means that properties will have to be written twice, rather than once for every pair of correspondents. doctest
has innumerable other issues as well, see for example the comment in vector/tests/doctests.hs
.
We can however invent some kind of markup that is both human and machine friendly, put it into haddocks and then write a program that will parse our markup out of haddocks and write for us a test suite. This is conceptually simpler than what doctest
tries to do because our design does not need to talk to ghci
— this is the fatal weakness of doctest
that we shall dodge.
The idea of the markup is like so:
Example:
-- | /O(1)/ First element.
--
-- This function is not defined on @fromList [ ]@. Its completion is 'vectorToMaybe'.
head :: Vector v a => v a -> a
{-# INLINE_FUSED head #-}
head v = v ! 0
-- | /O(1)/ Just the first element — or nothing when there are none.
--
-- This function is the completion of 'head'.
vectorToMaybe :: Vector v a => v a -> Maybe a
{-# INLINE_FUSED vectorToMaybe #-}
vectorToMaybe v = v !? 0
Here, the patterns This function is not defined on …. Its completion is ….
and This function is the completion of ….
are special syntax. This syntax is regular and so easy to parse out of haddocks already parsed out of the code by an off-the-shelf haddock parser.
Even better, maybe we can work with the «metadata» syntax already in haddock
to denote partial functions. This will help solve a lot of problems, like https://github.com/haskell/core-libraries-committee/issues/87.
I still need to think through a way of denoting sets more generally that is still human friendly. For some partial functions, like maybe backpermute
, there are infinitely many inputs for which they are not defined. And for most functions there are infinitely many inputs for which they are defined — we want to check these also. When inputs for which the given partial function is defined are almost all of its inputs, we can wield QuickCheck's ==>
operator, but when the partial function is only sparsely defined we cannot rely on this instrument.
There will also be a static check that all names in a given hard-coded list have checks generated for them. This is so that haddocks could not be broken by a future unsuspecting editor.
Precedent of using Maybe
suffix is wider than just readMaybe
: bitSizeMaybe
and oddities like naturalToWordMaybe
, minusNaturalMaybe
. It's on the long side but not unbearably so. Overall I think it's reasonable choice. (OrNothing
is 9 letters which is IMHO too long).
vector
is de-facto part of standard library so consistent naming is important. Multiple naming conventions are bad for novices: they won't be able to notice pattern and they're bad for experienced user since they have to remember all these conventions. Are there any other conventions in use?
Beside identifier names are not English words. They come from some weird agglutinative language which borrows extremely liberally from English.
Haddock first and foremost is a documentation tool. They are meant to be read by humans. doctest
is documentation tool as well. Its goal is to check that examples in documentation actually work. Attempts to shoehorn machine-readable specification into docstrings I think are misguided. We don't have good tested solution so we'll end up with pile of hack which will be later discarded since it's difficult to maintain without much gain
Metadata idea is interesting but does anything except @since
works? Do we need coordinate changes with haddock?
@Shimuuar As to naming, I added your arguments to the summary. But of course it is possible that I do not understand your point of view well enough, so let me know if I misinterpret you.
That some people like their names mangled beyond recognition is sadly true but I myself loathe to do do that and I do not see an argument for it being a best practice. Can you make an argument that consistently mangled names are better for novices and moderately experienced people than less consistent names that are each clear by themselves? I accept that narrow experts prefer shorthand, the strongest example I know of being Mathematical Components for Coq, which are impossible to understand for anyone but an expert. But I for example work with maybe 5 different languages and a good dozen of frameworks, and I have to learn a few new ones every year, so for me mangled names are a hardship. I do not see the need to learn that weird agglutinative language you speak about — I think this is the biggest hoax of software engineering.
We do not have to decide anything about names just yet. We can do some more research.
Haddock first and foremost is a documentation tool. They are meant to be read by humans.
doctest
is documentation tool as well. Its goal is to check that examples in documentation actually work. Attempts to shoehorn machine-readable specification into docstrings I think are misguided. We don't have good tested solution so we'll end up with pile of hack which will be later discarded since it's difficult to maintain without much gain
I am not sure I follow your logic. Examples and properties are exactly «machine-readable specification shoehorned into docstrings». Are you saying doctest
and the prop>
haddock notation are misguided? It may be so. But, as I write, we can offer a better, more sustainable design. Whether we can offer high quality design and code is a technical challenge. Maybe it will turn out to be great and everyone will start using it. How can we know the outcome before trying?
Metadata idea is interesting but does anything except
@since
works? Do we need coordinate changes with haddock?
Nothing except @since
is documented. But it seems the intention behind the design is that anything that starts with @
is a metadata field, so I think we can offer our own fields, like say @partial
. Let me offer an example.
-- | /O(1)/ First element.
--
-- @partial
-- @undefined @fromList [ ]@
-- @completion 'vectorToMaybe'
head :: Vector v a => v a -> a
{-# INLINE_FUSED head #-}
head v = v ! 0
-- | /O(1)/ Just the first element — or nothing when there are none.
--
-- @completes 'head'
vectorToMaybe :: Vector v a => v a -> Maybe a
{-# INLINE_FUSED vectorToMaybe #-}
vectorToMaybe v = v !? 0
What do you think?
Patching haddock
will be easy. I worked with them previously at https://github.com/haskell/haddock/issues/920. If we have an actual working case on our hands they will have to merge. The hard part is to design a general enough notation for sets of inputs and parse it into QuickCheck properties.
…Turns out the haddock
people already talked about @partial
in https://github.com/haskell/haddock/issues/1139.
IMO "completion" is not good terminology for documentation. I've never seen it being used in the context of Haskell before and it's not self-explanatory, so I doubt most people will know what it means.
I would just say "total version"/"partial version" or "total counterpart"/"partial counterpart", that way it's clear what you're talking about.
I have never seen it either. I was thinking that giving a name to this correspondence itself, and not only to its outcome, would add conceptual clarity. So, I picked «completion». But I am not on the seventh heaven of happiness about it.
How would you call the process of making a total function out of a partial function?
Ah, you should have said that you made it up! I'd just say "making the total version/counterpart". I don't think it helps to invent a word for it.
This is getting nowhere. But I think it would be useful to decide on naming conventions for total counterparts. From cursory haddock search it seems there are 3 existing conventions:
Maybe
suffix: headMaybe
, readMaybe
, maximumMaybe
... Used by base
and rio
May
suffix headMay
, readMay
, maximumMay
... Used by safe
safe
prefix. safeHead
, safeRead
, safeMaximum
... Used by universum
custom prelude. If there're other convention which I missed please bring them up.
I vote +1 on May
on account of shortness, +0.5 on Maybe
which fine too and -1 on safe
since it's little used and subjectively ugly. @lehins @Bodigrim what is your opinion?
@Shimuuar My vote is:
rio
is very valuable.May
is not worth it IMHO. It say sounds like it "may" fail and if it will that will only happen in May. :grin:index
and head
are also safe. They are just partial, but they are safe.+1 for Maybe
, 0 for safe
, -1 for May
.
So there's an unanimous support for the Maybe
suffix. Which we should use
So there's an unanimous support for the Maybe suffix. Which we should use
I love it! It took us only 4 short messages to decide on the naming scheme. That is what I call efficiency! :grin:
I grepped Hackage:
maybeFoo
pattern.fooMaybe
pattern.So I'd say that headMaybe
is better than maybeHead
.
@Bodigrim How does one grep Hackage?
I believe «maybe …» is the best choice. So say what. I am going to write whichever names I want and then before merging code into vector
I shall rename them whichever way you want. Maybe by then you will change your mind. Agreeable?
@konsumlamm Actually, turns out that my use of the word «completion» coincides with common usage:
The category of sets and partial functions is equivalent to but not isomorphic with the category of pointed sets and point-preserving maps. One textbook notes that "This formal completion of sets and partial maps by adding “improper,” “infinite” elements was reinvented many times, in particular, in topology (one-point compactification) and in theoretical computer science."
One of the design possibilities I am weighing is to wield haddock
's metadata notation, as I say in https://github.com/haskell/vector/issues/479#issuecomment-1910046183. I do not want to think about how to encode white space into metadata tag names, so, should this design possibility be picked for implementation, having a single word for any metadata field would be preferable. Writing something like @partial-counterpart
or @partial_counterpart
is not stylish.
If you have any specific suggestions surely do tell me. Otherwise I offer to wait until the design is clarified and a few examples are implemented. Then we shall better see what looks good and what does not. Agreeable?
haddock
's metadata notation will be helpful to us.I believe «maybe …» is the best choice. So say what. I am going to write whichever names I want and then before merging code into vector I shall rename them whichever way you want. Maybe by then you will change your mind. Agreeable?
It would save everyone's time and effort if you stick with agreed upon conventions. If you create PR with different naming and start arguing that functions should be named this or that I'll simply close PR.
Actually, turns out that my use of the word «completion» coincides with common usage:
But that makes it quite poor name. head
for example is well defined for empty vector. Yes it's bottom. But programmers care a lot about bottom and what sort of bottom it is. Also proposed total functions are merely "total". maximumByMaybe
only total if comparison function doesn't hang or throw exception etc. Bottoms are behind every corner
We aren't completing function which undefined on pats of its domain. We create functions which report errors differently.
I think simple See also 'headMaybe'
would be enough. No need to overengineer things
One of the design possibilities I am weighing is to wield haddock's metadata notation
We can't. It's not implemented. @partial
was proposed almost 4 years ago. It isn't implemented. We can't assume it will be implemented soon. So we have to do without.
If you want to go and implement such feature. Please do, it's very neat feature and would be nice to have.
We can't. It's not implemented.
@partial
was proposed almost 4 years ago. It isn't implemented. We can't assume it will be implemented soon. So we have to do without.If you want to go and implement such feature. Please do, it's very neat feature and would be nice to have.
haddock
development is in a sore state as its repo is to be merged into GHC source tree, but for various reasons has not been merged yet, so it's unclear where and how to send patches. This might get resolved soon in which case one can go and implement @partial
indeed, I'd be very much in favor of it.
How does one grep Hackage?
https://hackage-search.serokell.io/ supports regexps, I use it all the time. I linked the regexps I used above.
I believe «maybe …» is the best choice. So say what. I am going to write whichever names I want and then before merging code into vector I shall rename them whichever way you want. Maybe by then you will change your mind. Agreeable?
There is no particular hurry. If you wish to convince us for a different naming scheme, you can give it a try, but I think I've made up my mind already.
I tend to think that coming up with names is actually the biggest challenge here, the implementation per se is rather trivial. So I'd rather settle on them before writing any code. I personally would not mind though if you raise a draft PR with a disclaimer "All function names are tentative and are discussed in #479", but I do not quite recommend it.
I'm not sure what we want to do about partial functions like foldl1
. Surely we can define something like
foldl1Maybe :: (el -> el -> el) -> Vector el -> Maybe el
but is it worth its salt? If one is conscious to avoid partial functions, they can use the normal foldl :: (acc -> el -> el) -> acc -> Vector el -> acc
instantiating acc ~ Maybe el
.
After some thought I think foldl1Maybe
is actually valuable. It is possible to express it in terms of foldl
but it's not very compact. And what's worse it's easy to create strictness bugs. Maybe
is lousy accumulator, one have to force its field manually.
foldl1Maybe' :: (a -> a -> a) -> [a] -> Maybe a
foldl1Maybe' f = foldl' step Nothing where
step Nothing a = Just a
step (Just !a) b = Just $! f a b
Another approach is
foldl1Maybe f = fmap (uncurry (foldl f)) . V.uncons
but also foldl1
is typically used with monoids, where it is easy just to start reduction from mempty
.
Going further, are we looking to introduce fold1MaybeM'_
? scanr1Maybe'
? headMaybeM
?
uncons
based approach is not fusible. As aside vector
mixes up functions based on stream-fusion and relying on slice. I should create issue about this.
We should compile list of partial functions. It would be easier to decide what should we do then
We should compile list of partial functions. It would be easier to decide what should we do then
As noticed by @kindaro above, rio
already has a list: https://hackage.haskell.org/package/rio-0.1.22.0/docs/RIO-Vector-Boxed-Partial.html
This is the overview of the programme to add total counterparts to partial functions, such as
head
.I @kindaro am accountable for this overview and I shall keep this message up to date. If something is off, comment; if I am not answering, send me an electronic letter; if I am not answering again, take over.
We estimate that there is 50% likelihood to get this done in 70 hours of work or less. This number is computed by adding mean values of subjective probability distributions of time it would take to get each of the following tasks done:
I heartily welcome the kind reader to comment on the sections of this message marked with the code «TODO».
status: request for comments
Implementation will begin when consensus with the maintainers is reached and the design is clarified.
motivation
There are two aspects to this programme:
These two aspects translate into two goals that seem to be largely independent.
An optional goal is to offer a clear way of managing partial functions that can then be scaled to other core libraries.
mathematical ground
definitions
As an example of a naming issue called the «mathematical red herring principle», things that are called «partial functions» and «unsafe functions» in the Haskell community are not functions, and «total function» is needless tautology.
So, «partial function» is a convenient but misleading shortcut for «relation that reflects distinctions».
This is important to keep in mind because it means that any theory of functions is not going to be a true description of a definition that wields any partial functions. Many principles of reasoning about Haskell programs will be invalidated. See for example Fast and Loose Reasoning is Morally Correct. Unsafe functions cannot even be modelled as general relations — rather, a theory of automata will need to be wielded, — so they will not be looked at. It seems to be the consensus that the best solution for them is isolation, as in #361.
partial map classifier
From the classical perspective, any partial function f: A ⇸ B can be turned into a function f*: A → B + 1 with the target A + 1, which in Haskell can be modelled by
Maybe A
. This typeMaybe A
would be called «partial map classifier» in Mathematics.However, constructively we cannot really say if an arbitrary computation will terminate. For example, we do not know if a hailstone sequence starting from a big enough number will terminate. Technically speaking, each sound termination analyzer is incomplete.
Nevertheless, we hope to be able to say for any computation found in the core libraries and specifically
vector
either that:Then, we can turn those partial functions for which we are able to say so into functions by putting the inputs on which they will not terminate to
Nothing
. We shall call this process and its outcome «completion». Thanks to the classical theory, we know that every function has at most one completion, so this is not ambiguous. We can even say that completion is a partial function from partial functions to functions, however sadly not computable.example
This partial function from
Data.Vector.Generic
:— Is completed to this function:
design
A list of partial functions in
vector
is already compiled for us inrio
. There are 35 of them. None of them have a completion yet (TODO look again).For the sake of sustainability, automated checks should be added. Given a pair of a partial function and its completion:
Nothing
.Just
.In the ideal, these two sets should exhaust the source.
In the ideal, these checks will be defined in such a way that human friendly documentation can also be generated, then somehow such documentation should make its way to haddocks. This would ensure sustainability of documentation as well as code itself.
(TODO think of a way to get great automated checks and documentation)
naming
All new names shall be either:
Arbitrary abbreviations and shorthands will not be allowed. More specific names will be picked rather than more general ones.
(TODO draft specific suggestions)
scope
Only definitions exported from
Data.Vector
are to be completed at this time. (TODO what other definitions should we work on?) Unsafe definitions that cannot be trivially modelled as relations will not be looked at — their completion is not grounded in Mathematics; they are out of scope.future work
In the ideal, we should offer two modules: the one would export only functions and the other would export only partial functions. That the way we mark definitions as either lets us hope that in future work such modules can be generated automatically. This would complete the solution of the aspect № 2.
chunks of work
The work will be done in working, fully done chunks, split as follows:
head
inData.Vector.Generic
with basic checks and documentation.prior art
(TODO mention other conversations where similar programmes were talked about)
base
: https://github.com/haskell/core-libraries-committee/issues/87.