nasa / Transform-to-Open-Science

Transformation to Open Science
Other
742 stars 149 forks source link

Licensing for open science #116

Closed hmjbarbosa closed 2 years ago

hmjbarbosa commented 2 years ago

Dear all,

@admercs recently posted about Definitions and Goal and mentioned that "licensing is critical in reducing barriers to IP use and promoting collaboration".

I am a scientist and I try to share the various pieces of code I write on github. But I'm always confused about the many "free" licenses out there, and I guess there are many scientists around the world in the same situation.

Has the OpenScience community as defined a standard license that we should use? Does TOPS/NASA has a recommendation about that?

Henrique

admercs commented 2 years ago

That is a great question and there is another resource I would like to share regarding this here.

ha0ye commented 2 years ago

The "standard" recommendations for broad usage and re-usage, if you don't have concerns about IP, sensitive data, copyleft, etc. are

Data: CC0 (public domain release) Code: MIT License creative works (writing, images, etc.): CC-BY

jordanpadams commented 2 years ago

@nasacrawford I know CCO for data is expected, but is there a guideline per SMD for preferred open source software license?

penyuan commented 2 years ago

Thank you @hmjbarbosa for opening this issue and @admercs and @ha0ye's responses so far. I'd like to make a few additions and clarifications.

I am also aware of the discourse that the "standard" (who are "setting" these "standards"?) for those who "don't want to think about it" is to go with the licenses that @ha0ye suggested, e.g. CC0, MIT, CC BY, etc. However, I contend that "don't want to think about it" is not good practice for open science. For example, CC0, MIT, and CC BY fall under the category of non-reciprocal (also commonly known as "permissive") licenses, which means other people can take works under these licenses and create derivative works that are closed source. There are many cases where this is highly undesirable and harmful and can come back to bite you. Unfortunately, many scientists (and non-scientists!) go for these licenses as the default because they didn't want to think about it, and come to regret it later. They are often completely unaware that there are licenses which can preserve the freedoms that come with open data and other open source outputs.

I recognise that the legal details of open source licensing can be exhausting to fully understand, and will not go into them here. That said, I think it is critical to at least point out that there are two families of open source licenses rather than just the MIT-likes. I propose the following:

Non-reciprocal ("If you are happy with others taking what you made and creating closed-source derivatives"):

Reciprocal ("If you want others to respect the freedoms of open source by also releasing their derivatives of your work as open source"):

As you can see, I've included open source hardware licenses because there are hardware-specific legal implications that non-hardware licenses don't address.

And back to @hmjbarbosa's original question:

Has the OpenScience community as defined a standard license that we should use?

No.

Unlike open source software/free software (where "free" means freedom, not free-of-charge) or open source hardware, I don't think there is a standards setting body that prescribes exactly which licenses must be used to qualify a work as open science.

Similarly, I take issue with colloquially referring to the default or "standard" licenses for open source software as MIT, CC BY, etc., because (to my knowledge) there was never a conscious effort to establish such standards, and it obfuscates the need to think about open source licensing more deeply. Again, I refer to my paragraph above on the consequences of not thinking about this.

selgebali commented 2 years ago

I would like to add to that advise on "Dual Licensing" models for OSS, e.g. , https://en.wikipedia.org/wiki/Multi-licensing. More on that in this twitter thread

cgentemann commented 2 years ago

Thanks @hmjbarbosa for the question. NASA's current Information Policy for the Science Mission Directorate (SMD):

This is a nice article about permissive versus copyleft licenses. Common permissive licenses are MIT, Apache2, BSD3, BSL.

Policy people will probably pick up on two key words there, but as a scientist I didn't really catch the difference until @nasacrawford pointed it out. The data policy states 'Shall' - this is a requirement. The software policy states 'Should' this is a recommendation.

The proposed update to the SMD Information Policy was released for comment 11/2021 and changes the language for software, "SMD-funded software shall be released under a permissive license that has broad acceptance in the community"

The comments are being addressed and it is expected that a new SMD information policy will be released this Summer.