Closed torgo closed 1 year ago
@torgo Thanks for the suggestion. Would you like to do a PR for this?
Happy to give it a go if we think this is the right approach.
Ok - just having another think about this and polling the fediverse. Is this repo where the data license should live? I don't think it's appropriate or reasonable to put a general stipulation in here that anyone who generates data sets using Scorecard must release the data under an open license. What we're talking about here is the specific data set that is being placed in the BigQuery engine. So if feels to me like the license should be closer to the data rather than in this repo? However @naveensrinivasan if you think the thing I've suggested above would do the trick I'm happy to do a PR.
cc @david-a-wheeler This was relevant to one of the topics in the Scorecard sync today, and seems more like something the OpenSSF should be making the decision on.
Updating this issue as it seems more appropriate to assign the data license (my proposal is cc0) to the data available via the API.
I formally filed a legal review request with LF Legal as internal legal review issue LR-1558. I suspect that this is fine, but since I'm not a lawyer, I think it's important to bring this question to actual lawyers. I will definitely let you know once I know something. Thanks for asking!
Making our licensing clear is a good thing. Our legal team has some concerns about the CC0 license, especially outside the US.
One possibility they raised was to state that contributions of data would be under CDLA Permissive 2.0 and made available under that license. The license is here: https://cdla.dev/permissive-2-0. The OpenSSF Charter already authorizes this at: https://openssf.org/about/charter/.
However, after looking at their response (I just back from vacation) I realized that they may be thinking we're only accepting and distributing contributed data. Let me circle back to them for confirmation. I've learned that legal answers can be really specific depending on the circumstance, and I want to make sure that they understand our circumstance so they can give us good answers.
Making our licensing clear is a good thing. Our legal team has some concerns about the CC0 license, especially outside the US.
One possibility they raised was to state that contributions of data would be under CDLA Permissive 2.0 and made available under that license. The license is here: https://cdla.dev/permissive-2-0. The OpenSSF Charter already authorizes this at: https://openssf.org/about/charter/.
However, after looking at their response (I just back from vacation) I realized that they may be thinking we're only accepting and distributing contributed data. Let me circle back to them for confirmation. I've learned that legal answers can be really specific depending on the circumstance, and I want to make sure that they understand our circumstance so they can give us good answers.
Does contributed data in this sense refer to the GitHub API data we consume? Or is this distinction around API data that comes from our weekly cron vs data submitted by individual repos via scorecard action?
Does contributed data in this sense refer to the GitHub API data we consume? Or is this distinction around API data that comes from our weekly cron vs data submitted by individual repos via scorecard action?
They used the term "contribution", so I guess I'm not sure. They probably knew what they meant; I'm just reporting back. I filed more info & talked briefly with one of our lawyers.
I propose that we wait until July 19 to see if there are additional clarifications. They asked for time through the end of this week, but giving them a little extra time seems wise. My understanding is that in the US you can't really have a copyright on facts, but there are many asterisks to that statement, so having a clear license statement seems prudent.
If we don't hear otherwise, then after July 19 we should just attach the CDLA Permissive 2.0 and make it clear that generated data is available under that license. The license is here: https://cdla.dev/permissive-2-0. It basically lets receivers do whatever they want with the data, but makes it abundantly clear that there is "No Warranty; Limitation of Liability" (which from a risk point-of-view makes it better than CC0 for releasing data). This is also the easy path, because the OpenSSF Charter already authorizes this license at: https://openssf.org/about/charter/. They had previously recommended using this license for data, so their recommendations and the charter are all consistent.
In short, that's what our legal folks recommend & I think it makes sense. Does this seem like a reasonable plan?
The OpenSSF charter and our lawyers recommend CDLA for this case. So as long as we clearly say the generated data is released under the CDLA then all is well.
This issue is stale because it has been open for 60 days with no activity.
Completed via #3404
Is your feature request related to a problem? Please describe. Potential users of the batch generated data will want to know the license under which this data is released so they can be sure they can use it for their use case.
Describe the solution you'd like Clearly sign post the license under which the data is released (ideally using a creative commons cc-0 license).
Additional context One solution could be to add a second license file explicitly marked as applying to the BigQuery data (something like LICENSE_bigquerydata.md) which would contain the cc-0 license text.