haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.62k stars 691 forks source link

Changing the type of PackageDescription.category from ShortText to [ShortText] #7850

Open Kleidukos opened 2 years ago

Kleidukos commented 2 years ago

At present time, Cabal & Hackage encourage the end-user to include multiple categories in a Cabal file.

This practice, which is widely adopted, is currently not reflected in the API, as the type of the categories field for the PackageDescription tells us that its content is a ShortText.

On its side, Hackage enables this with the categorySplit function in its web renderer.

As a consequence, I would like to change the definition of PackageDescription.copyright from ShortText to [ShortText].

phadej commented 2 years ago

Why? Cabal doesn't care about category field at all, it's a free form text. People can write whatever. If the type is changed to [ShortText] it should still accept every possible input (at least for older cabal-versions).

EDIT: By the same argument author, maintainer fields should be lists too. Yet there could be anything. I'm :-1: on this.


TBH, Any changes to cabal file description are not good first issues IMO. These changes have to be done carefully, you have to know what you are doing.

Kleidukos commented 2 years ago

Why? Cabal doesn't care about category field at all, it's a free form text. People can write whatever. If the type is changed to [ShortText] it should still accept every possible input (at least for older cabal-versions).

And you don't think that [ShortText] is able to represent the free-form text that ShortText enables? Can you provide meaningful examples of regressions?

We literally have a behaviour that we encourage explicitly by a consumer of Cabal (Hackage) for end-user writers of Cabal files. Which means that we have created an observable, standard behaviour that is not reflected in the API but on which everyone is depending.

Categories are determined by whatever you put in the Category field. You should try to pick existing categories when possible. You can have more than one category, separated by commas. If no other versions of the package exist, the categories automatically become the package's tags.

https://hackage.haskell.org/upload

As such, if something changes internally in Hackage that changes the display, users will see a breakage of their reasonable expectations, without any kind of change in the Cabal format standard or in the library itself.


EDIT: By the same argument author, maintainer fields should be lists too. Yet there could be anything. I'm -1 on this.

Yes, this conclusion is valid and I support it, but it's not the purpose of this ticket, let's stay focused please.

If you want to talk about the grand scheme of things, I would say that Hackage has brought innovation that should be up-streamed in Cabal to better reflect the usages that have been encouraged.

TBH, Any changes to cabal file description are not good first issues IMO. These changes have to be done carefully, you have to know what you are doing.

Very well, I've removed the label.

emilypi commented 2 years ago

I'm 👍 on this. I welcome quality of life improvements like this, and since @phadej is not a maintainer and should not be confused with one, take his disapproval with a grain of salt.

Bodigrim commented 2 years ago

And you don't think that [ShortText] is able to represent the free-form text that ShortText enables?

I think it's rather other way around. You can reproduce with [ShortText] anything you can express as ShortText, but not uniquely. Currently there is a single normal form for the field "foo, bar, baz" :: ShortText, while with [ShortText] you can have ["foo", "bar", "baz"], but also equally valid ["foo", "bar, baz"], ["foo ", "bar, ", "baz"] and anything in between. Ideally we could have [ShortTextWithoutCommas], but that's probably too far fetched.

PackageDescription is a public API, and any change to it is breaking for multitude of consumers (with Hackage being only one of many).

emilypi commented 2 years ago

@Bodigrim we are planning on introducing breaking changes to the PackageDescription in the 4.0 release, since there are many warts that exist for historical reasons that need correction. This proposal would be at home among those changes. Of these include homogenizing the way lists are handled in general, since it's entirely nondeterministic when comma separated vs. spaced/newline separation is required.

Kleidukos commented 2 years ago

I think it's rather other way around

Ok I see what you mean. I don't deny the existence of such usages of the API (this would be ridiculous), but considering the ways that this API has been consumed by Hackage, I think the safest option is to broadcast the fact that we are formalising a behaviour that many people have relied on without previous guarantees, and that we are now putting guarantees on this behaviour.

PackageDescription is a public API, and any change to it is breaking for multitude of consumers (with Hackage being only one of many).

Yes, this is a breaking change in terms of API but not grammar, I clarified this with Gershom before submitting the ticket

Bodigrim commented 2 years ago

(There is a typo in the title, which may confuse readers).

Currently the interpretation of category :: ShortText is up to consumers. Some of them does not interpret it at all, some split by commas, but others (e. g., private hackage servers) are free to use any other separator, if it fits their needs better. Is there a tangible benefit to standartise it?

Is the field proposed to remain category or be renamed to categories?

Kleidukos commented 2 years ago

@Bodigrim See, that's a transition path that I could see happening:

but others (e. g., private hackage servers)

From my experience, private Hackage instances are run without editing the display of that kind of information (you're welcome to tell me you've seen the opposite, in which case I'd love to contact those people). When it comes to non-Hackage consumers (like my own package index), the output of categories is optimal in terms of categorisation and indexing. It's not really free-form in its consuming, it's expected to be a list of categories, which in turn index the packages that belong to them.

phadej commented 2 years ago

This feels like arguing about haddock comment syntax on GHC issue tracker. Kind of relevant, but not really.

Kleidukos commented 2 years ago

@phadej Considering how much GHC is involved in giving information to Haddock, I wouldn't have picked that example. You've made your point, and you've been heard.

phadej commented 2 years ago

I meant the syntax like using /for italics/.

Kleidukos commented 2 years ago

Which in turns has literally nothing to do with GHC but with Haddock's parser. Listen, you've made your point, and you've been heard, but you can't lead the dance for everything, even on projects in which you've been involved in the past.

phadej commented 2 years ago

Understood.

gbaz commented 2 years ago

Currently the interpretation of category :: ShortText is up to consumers. Some of them does not interpret it at all, some split by commas, but others (e. g., private hackage servers) are free to use any other separator, if it fits their needs better. Is there a tangible benefit to standartise it?

I think that this is the entire nub of the question. Hackage's splitting of categories means that de-facto, this is the "standard" interpretation of the field. So putting this change to the datatype just codifies that. It seems a rather small convenience with no harm I can anticipate.

I absolutely would oppose changing "category" to "categories" because this would be a change to the cabal grammar itself, and have very little benefit.

But I would also, by the way, support changing author and maintainer fields to be lists, for the same reasons as this. It brings the datatype closer to the usage (thus providing minor uniformity and convenience), and has no particular cost. Certainly none of these changes will prevent any text at all from being entered -- it just means that if the text entered has commas, then it will be split on them in the datatype representation.

davean commented 2 years ago

I don't believe there is a single package in the public that violates the proposed format. There are some that might cause parser confusion though - is there any survey of these?

(I'm +1, just think the legwork should be done)

emilypi commented 2 years ago

Congratulations on our first 4.0 issue @Kleidukos! Can't wait to see this happen 😄

Kleidukos commented 2 years ago

@davean I didn't plan on changing the parser (that accepts the syntax) but rather add a transformation pass akin to what Hackage does with categorySplit right before creating the data structure. But maybe I am missing something. :)

@emilypi Thank you!

fgaz commented 2 years ago

@gbaz

But I would also, by the way, support changing author and maintainer fields to be lists, for the same reasons as this

I want to point out that this is not as simple as with category. While categories with commas in them probably don't exist at all and if they do it was a bad idea in the first place, names and emails can contain commas and are currently unquoted.