Describe the bug

Sometimes it is hard to discover & put appropriate license. Especially in proprietary, default case (NONE) or corner cases, or when licenses have compounds in names/versions.

To Reproduce

Lets follow how Haskell developers form a package, & populate license: field:

Because project/package bootstrapping can take quite a time, or there is some customization involved - people frequently use other projects & samples. Previously I copied particular files manually, or used real projects & deleted code from them, or used summoner tool. Now - I've made a bootstrap for myself & so copied bootstrap/sample.

Then during the renaming of things in the .cabal - since project/package is a private project, testing assignment, company project - license field needs to be changed.

Haskeller googles "Haskell cabal licenses" & because person is Haskeller: Screenshot-2022-02-11-20:46:42

(looking into the future - SPDX search also would return nothing particularly useful) Screenshot-2022-02-11-20:09:04

Websearch engine determines as relevant & advertises Hoogle, particularly https://hackage.haskell.org/package/Cabal-

Which is great. Seems like License data type is what license field wants.

And in a number of cases that assumption would be true - MIT, BSD* indeed would fit.

Lets look at UnspecifiedLicense or AllRightsReserved.

Since all countries in the world signed the agreement that no license defaults to be https://en.wikipedia.org/wiki/All_rights_reserved

license: UnspecifiedLicense:

> cabal v2-build                                                                 (1)
Errors encountered when parsing cabal file ./project.cabal:

project.cabal:11:35: error:
unexpected Unknown SPDX license identifier: 'UnspecifiedLicense' 

   10 | maintainer:     user@email.com
   11 | license:        UnspecifiedLicense

This error is not healful at all. The #5697 situation strongly applies here. SPDX happens to be an open-source license list - it has no accounting for any default closed source license scenarios.

license: AllRightsReserved:

> cabal v2-build                                                                 (1)
Errors encountered when parsing cabal file ./project.cabal:

project.cabal:11:35: error:
unexpected Unknown SPDX license identifier: 'AllRightsReserved' You can use NONE as a value of the license field.

   10 | maintainer:     user@email.com
   11 | license:        AllRightsReserved

This error is a bit better - it mentions the ability to use NONE for a license field.

After that - I become curious:

  1. Why it seems that Distribution.License is a valid source of information - the closest possible source right to reading source code directly, but it does not work.
  2. What is the acronym SPDX that tooling keeps referring to. The first idea - it probably some Haskell type, maybe Cabal-internal one, or maybe some conventional filetype, or conventional spec.
  3. Hey, cabal asks for SPDX & its module source of truth Distribution.License happens to have function: knownLicenses :: [License], it probably should give the index:
    λ> knownLicenses
    [GPL Nothing,GPL (Just (mkVersion [2])),GPL (Just (mkVersion [3])),LGPL Nothing,LGPL (Just (mkVersion [2,1])),LGPL (Just (mkVersion [3])),AGPL Nothing,AGPL (Just (mkVersion [3])),BSD2,BSD3,MIT,ISC,MPL (mkVersion [2,0]),Apache Nothing,Apache (Just (mkVersion [2,0])),PublicDomain,AllRightsReserved,OtherLicense]
    it :: [License]

(I've tried/experimented to put values literally - maybe that is a universal way of giving the Cabal license values)

> cabal v2-build                                                                 (1)
Errors encountered when parsing cabal file ./project.cabal:

project.cabal:11:35: error:

unexpected Unknown SPDX license identifier: 'AGPL' 

   10 | maintainer:     user@email.com
   11 | license:        AGPL (Just (mkVersion [3]))

Unknown SPDX license identifier: 'AGPL' - confusing :confused:, which values then it accepts.

  1. Ok Cabal error message always talks about SPDX values.

In Distribution.License there is [licenseToSPDX :: License -> License - great - a converter to SPDX - which Cabal keept nagging me into:

λ> licenseToSPDX AllRightsReserved

It works!

licenseToSPDX $ PublicDomain
License (ELicense (ELicenseRef (LicenseRef {_lrDocument = Nothing, _lrLicense = "PublicDomain"})) Nothing)
it :: Distribution.SPDX.License.License

Tried the License .., then entered the ELicense .. value into license: field, received unhelpful messages.

Checked PublicDomain & received a direction toward using anything more substantial then PublicDomain value.

Ok, lets check some default open source license:

λ> licenseToSPDX $ AGPL (Just (mkVersion [3]))
License (ELicense (ELicenseId AGPL_3_0_only) Nothing)
it :: Distribution.SPDX.License.License

Again, now with this SPDX value - tried to enter the License .., then tried to enter ELicense .. value into license: field, received unhelpful messages.

unexpected Unknown SPDX license identifier: 'License' 

   10 | maintainer:     user@email.com
   11 | license:        License (ELicense (ELicenseId AGPL_3_0_only) Nothing)
      |                        ^

unexpected Unknown SPDX license identifier: 'ELicense' 

   10 | maintainer:     user@email.com
   11 | license:        ELicense (ELicenseId AGPL_3_0_only) Nothing

So, licenseToSPDX function was not helpful for the user.

Distribution.SPDX.License points to that License ... expression is a SPDX format. But if Cabal wants to get SPDX format, and value is in SPDX format - why Cabal does not accepts the SPDX that is generated with its own functions.

Expected behavior

System information

cabal-install version
compiled using version of the Cabal library
fgaz commented 2 years ago

First of all, as you found out, hackage/haddock is not a meaningful documentation source for executables and file format specifications, it's only for library apis. You wouldn't expect to find documentation about eg. pandoc flags, or the spec for pandoc templates there.

I don't know how/if we could make this clearer. The cabal website already links to readthedocs and not to hackage, and we can't really control what google says.

As per https://github.com/haskell/cabal/issues/5697#issuecomment-1036449858 I'd welcome a pr improving the license section of the docs, clarifying that the field is optional (though default: NONE already implies that). The spdx docs are already linked from that section.

Alternative: Link to the source code of the checker [...]

I think linking to the source of anything (or to hoogle) would be a step in the wrong direction.

The "unexpected Unknown SPDX license identifier" error could certainly link to https://cabal.readthedocs.io/en/latest/cabal-package.html#pkg-field-license though

ptkato commented 2 years ago

I'd welcome a pr improving the license section of the docs, clarifying that the field is optional (though default: NONE already implies that)

I have that already underway.

Anton-Latukha commented 2 years ago

I mainly denoted the experience, even if for myself for later action.

An error message is generally the best place to put the documentation in some of the enumerated cases.

A lot of things can be solved with the docs. But there is a limit to docs design. Programmers try to skim the web first. Docs desync from the source code all the time. Yes, the problem with the docs - that Google mostly directs to the main init pages of Cabal & almost never directs into the proper place in the docs, but I do not thing the Google search engine can be blamed there, as the length of doc articles are huge that for the search engines exaggerate the issue & is probably the main problem of undiscovrability, so the web search can not direct the user to the proper part of the docs, the part of the thread subparagraph needs to be actively inferred/found by the reader. And in many cases the place where the content is - can not be inferred, for example - licensing information - it is either in the initial documentation, or in some advanced part, or in some legal section, or in package description section for the field. Docs approach works only up to some volume & beyond that limit of volume - volume becomes unmaintainable (through classical source code process means), which makes docs out of date, which makes people do not trust the docs, which makes people not read the docs.

For example - in the mentioned section: cabal.readthedocs.io/en/latest/cabal-package.html#pkg-field-license it is hard to infer wherever the Pre-SPDX Legacy Identifiers are available in new versions. On the first read I thought that those are just accepted old-time ways of providing license (and I used to refer & use Distribution.License.License as a reference - and it worked in the past in the majority of cases). Try shows that project responds differently to different Distribution.License.License entries, but docs are unclear on the state is it possible to use them or not in new releases. And for example, the docs give listing of main license entries examples for the deprecated way, but does not provide a set of the frequently used licenses for SPDX case & essentially sends to read the SPDX manual & run trials on the tool to find a proper way to set LGPL 3.1 through SPDX. So the docs essentially direct & encourage people to use the old way which is said to been deprecated. This unclearness in the docs & situation - links directly to the situation I started report with - I went into that Pre-SPDX Legacy Identifiers datatype & started using it. If I'd found & read the pkg-field-license - probably ended up pretty much on the same path, but was a bit less confused.

Docs are a great tool, they frequently seem a great way of solving everything, but thunking things into docs is not always a great design. On errors of parsing license: - linking people back into the docs - is indeed the main part of the solution, linking people in error messages from the tool into docs - is a pretty good desing/way to guide them. But also error messages are docs in themselves.

Also, CI needs to run checks that links into docs are valid.

So, annotated the situation, would try to submit/close this, solving particular details, accordingly in the future, as these details are nice tasks to solve for a newcomer.

Anton-Latukha commented 2 years ago

Well, I was quick to promise to solve it. Because the war is approaching & everything hangs in the air constantly here. So, lets say "maybe I would not be able".

Mikolaj commented 2 years ago

Take care, @Anton-Latukha. No pressure, obviously.

Anton-Latukha commented 2 years ago

gbaz commented 2 years ago

@Anton-Latukha I have tried to message you on irc but not received a reply. Please ping me at sclv on liberachat.

asarkar commented 1 year ago

