Open Kleidukos opened 2 years ago
Why? Cabal
doesn't care about category
field at all, it's a free form text. People can write whatever. If the type is changed to [ShortText]
it should still accept every possible input (at least for older cabal-version
s).
EDIT: By the same argument author
, maintainer
fields should be lists too. Yet there could be anything. I'm :-1: on this.
TBH, Any changes to cabal file description are not good first issues IMO. These changes have to be done carefully, you have to know what you are doing.
Why? Cabal doesn't care about category field at all, it's a free form text. People can write whatever. If the type is changed to [ShortText] it should still accept every possible input (at least for older cabal-versions).
And you don't think that [ShortText]
is able to represent the free-form text that ShortText
enables? Can you provide meaningful examples of regressions?
We literally have a behaviour that we encourage explicitly by a consumer of Cabal (Hackage) for end-user writers of Cabal files. Which means that we have created an observable, standard behaviour that is not reflected in the API but on which everyone is depending.
Categories are determined by whatever you put in the Category field. You should try to pick existing categories when possible. You can have more than one category, separated by commas. If no other versions of the package exist, the categories automatically become the package's tags.
https://hackage.haskell.org/upload
As such, if something changes internally in Hackage that changes the display, users will see a breakage of their reasonable expectations, without any kind of change in the Cabal format standard or in the library itself.
EDIT: By the same argument author, maintainer fields should be lists too. Yet there could be anything. I'm -1 on this.
Yes, this conclusion is valid and I support it, but it's not the purpose of this ticket, let's stay focused please.
If you want to talk about the grand scheme of things, I would say that Hackage has brought innovation that should be up-streamed in Cabal to better reflect the usages that have been encouraged.
TBH, Any changes to cabal file description are not good first issues IMO. These changes have to be done carefully, you have to know what you are doing.
Very well, I've removed the label.
I'm 👍 on this. I welcome quality of life improvements like this, and since @phadej is not a maintainer and should not be confused with one, take his disapproval with a grain of salt.
And you don't think that
[ShortText]
is able to represent the free-form text thatShortText
enables?
I think it's rather other way around. You can reproduce with [ShortText]
anything you can express as ShortText
, but not uniquely. Currently there is a single normal form for the field "foo, bar, baz" :: ShortText
, while with [ShortText]
you can have ["foo", "bar", "baz"]
, but also equally valid ["foo", "bar, baz"]
, ["foo ", "bar, ", "baz"]
and anything in between. Ideally we could have [ShortTextWithoutCommas]
, but that's probably too far fetched.
PackageDescription
is a public API, and any change to it is breaking for multitude of consumers (with Hackage being only one of many).
@Bodigrim we are planning on introducing breaking changes to the PackageDescription
in the 4.0 release, since there are many warts that exist for historical reasons that need correction. This proposal would be at home among those changes. Of these include homogenizing the way lists are handled in general, since it's entirely nondeterministic when comma separated vs. spaced/newline separation is required.
I think it's rather other way around
Ok I see what you mean. I don't deny the existence of such usages of the API (this would be ridiculous), but considering the ways that this API has been consumed by Hackage, I think the safest option is to broadcast the fact that we are formalising a behaviour that many people have relied on without previous guarantees, and that we are now putting guarantees on this behaviour.
PackageDescription is a public API, and any change to it is breaking for multitude of consumers (with Hackage being only one of many).
Yes, this is a breaking change in terms of API but not grammar, I clarified this with Gershom before submitting the ticket
(There is a typo in the title, which may confuse readers).
Currently the interpretation of category :: ShortText
is up to consumers. Some of them does not interpret it at all, some split by commas, but others (e. g., private hackage servers) are free to use any other separator, if it fits their needs better. Is there a tangible benefit to standartise it?
Is the field proposed to remain category
or be renamed to categories
?
@Bodigrim See, that's a transition path that I could see happening:
We keep category
, and add a new categories
field that is automatically filled with the output of the categorySplit
function from Hackage codebase. This way, the behaviour from Hackage is kept. This would be transparent for the users of cabal files when they look at the output on the Hackage website
Regarding Cabal API, we can document what is the expected format of the categories
field, based on the splitting algorithm. We leave the choice to use the new or the old one to direct consumers of the API, with a deprecation schedule.
but others (e. g., private hackage servers)
From my experience, private Hackage instances are run without editing the display of that kind of information (you're welcome to tell me you've seen the opposite, in which case I'd love to contact those people).
When it comes to non-Hackage consumers (like my own package index), the output of categories
is optimal in terms of categorisation and indexing. It's not really free-form in its consuming, it's expected to be a list of categories, which in turn index the packages that belong to them.
This feels like arguing about haddock
comment syntax on GHC issue tracker. Kind of relevant, but not really.
@phadej Considering how much GHC is involved in giving information to Haddock, I wouldn't have picked that example. You've made your point, and you've been heard.
I meant the syntax like using /for italics/
.
Which in turns has literally nothing to do with GHC but with Haddock's parser. Listen, you've made your point, and you've been heard, but you can't lead the dance for everything, even on projects in which you've been involved in the past.
Understood.
Currently the interpretation of
category :: ShortText
is up to consumers. Some of them does not interpret it at all, some split by commas, but others (e. g., private hackage servers) are free to use any other separator, if it fits their needs better. Is there a tangible benefit to standartise it?
I think that this is the entire nub of the question. Hackage's splitting of categories means that de-facto, this is the "standard" interpretation of the field. So putting this change to the datatype just codifies that. It seems a rather small convenience with no harm I can anticipate.
I absolutely would oppose changing "category" to "categories" because this would be a change to the cabal grammar itself, and have very little benefit.
But I would also, by the way, support changing author and maintainer fields to be lists, for the same reasons as this. It brings the datatype closer to the usage (thus providing minor uniformity and convenience), and has no particular cost. Certainly none of these changes will prevent any text at all from being entered -- it just means that if the text entered has commas, then it will be split on them in the datatype representation.
I don't believe there is a single package in the public that violates the proposed format. There are some that might cause parser confusion though - is there any survey of these?
(I'm +1, just think the legwork should be done)
Congratulations on our first 4.0 issue @Kleidukos! Can't wait to see this happen 😄
@davean I didn't plan on changing the parser (that accepts the syntax) but rather add a transformation pass akin to what Hackage does with categorySplit
right before creating the data structure. But maybe I am missing something. :)
@emilypi Thank you!
@gbaz
But I would also, by the way, support changing author and maintainer fields to be lists, for the same reasons as this
I want to point out that this is not as simple as with category
. While categories with commas in them probably don't exist at all and if they do it was a bad idea in the first place, names and emails can contain commas and are currently unquoted.
At present time, Cabal & Hackage encourage the end-user to include multiple categories in a Cabal file.
This practice, which is widely adopted, is currently not reflected in the API, as the type of the
categories
field for the PackageDescription tells us that its content is aShortText
.On its side, Hackage enables this with the
categorySplit
function in its web renderer.As a consequence, I would like to change the definition of
PackageDescription.copyright
fromShortText
to[ShortText]
.