proycon / codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
GNU General Public License v3.0
8 stars 4 forks source link

Lincenses x ways #8

Open broeder-j opened 2 years ago

broeder-j commented 2 years ago

Currently codemeta harvester finds and parses:

LICENSE LICENSE.md LICENSE.txt COPYING COPYRIGHT LICENCE LICENCE.md LICENCE.txt

Additional ways I have found among others how people name stuff:

LICENCE.ASAD
LICENCE.GPL
LICENSE2
LICENSE-MIT
LICENSE-APACHE
LICENSE.rst
LICENSE.info
LICENSE-BSD.txt
LICENSE-GPLv3.txt
LICENSE-LGPLv3.txt
LICENSE.OpenSSL
LICENSE.GPLv3
LICENSE.TERMS
LICENSE.BSD-2-Clause
LICENSE.BSD-3-Clause
LICENSE.CC-BY-SA-4.0
LICENSE.Freescale
LICENSE.GPL-2.0
LICENSE.JFFS2
LICENSE.NET
LICENSE.RPCXDR
LICENSE.TXT
LICENSE.MD
LICENSE.html
LICENSE-3RD-PARTY.txt
license.txt
license
license.md
licence.pl
copyright

Then an additional way, if several licenses are specified, the usage of a LICENSES folder containing files not named LICENSE:

LICENSES/licenses: (folder)

CC0-1.0.txt
CC-BY-4.0.txt
MIT.txt

APACHE-LICENSE-2.0-header.txt
APACHE-LICENSE-2.0.txt
ELASTIC-LICENSE-2.0-header.txt
ELASTIC-LICENSE-2.0.txt
ELASTIC-LICENSE-header.txt
ELASTIC-LICENSE.txt
license.go
license_header.go

I suggest to use * for matching like LICENSE*, ... Also to somehow parse the directory case? Does codemeta.json allow for a list of licenses anyway?

proycon commented 2 years ago

Does codemeta.json allow for a list of licenses

Yes, there should be no problem in specifying multiple.

Additional ways I have found among others how people name stuff: I suggest to use for matching like LICENSE, ...

One concern I have is if we accidentally match something that is meant to only apply to a sub-part (like some include dependency), and misjudge it as license for the whole. But perhaps a wildcard match may work.