nlesc-sigs / data-sig

Linked data, data & modeling SIG
Other
5 stars 3 forks source link

Data license for dataset with data from sources without data license #43

Closed jvdzwaan closed 5 years ago

jvdzwaan commented 5 years ago

In the bridging the gap project, we made a dataset consisting of books taken from online sources. The problem is that these sources do not specify a data license. We would (of course) like to add a data license to our dataset.

The dataset is on github: https://github.com/arabic-digital-humanities/fiqh In the readme we state:

The BookSource field in the metadata specifies the source the book was taken from. Most of the books were collected from al-Maktaba al-Shamila website. The books in the website are generated using the al-Maktaba al-Shamila software program developed by al-Makab al-Taʿāwnī lil-Daʿwā bil-Rawdā. In the program itself the developers state that the books are free and they encourage others to distribute them. They also stress that the books are not to be used to distribute what is deviant to the Sunni doctrine. The Jaʿfarī books were collected from Maktabat Yuʿsūb and they say nothing regarding their license.

What should we do? @LourensVeen Maybe you have some advice for us?

arnikz commented 5 years ago

Are you considering to publish both the metadata about the dataset and the dataset itself? You could use CC-BY-4.0 license for the metadata and probably also for the actual data (but not 100% sure about the additional requirements for the latter).

LourensVeen commented 5 years ago

Hey, sorry, this got caught in a mail filter and I didn't see it!

There are various things to consider here: 1) copyright of the individual books, 2) copyright on the collection of books, and 3) database rights on the collection of books.

For 1), are these recent books or historic works? You say they have been created by a computer program? That sounds like I missed an advance in AI, what does that mean exactly? (That's relevant because creativity is an important aspect of whether something is covered by copyright) I guess "not to be used to distribute what is deviant to the Sunni doctrine" could constitute a copyright licence, but a dangerously vague one. Does that exclude distributing them together with Shia or Sufi works in a single data set? After all, that could convince someone to consider other schools of thought. Do you have a link to the source and that note?

2) is only an issue if this is a curated dataset, i.e. some expert selected and arranged these books to convey something; I guess that this is more of the "everything we have on topic X" kind, right?

For 3), a statement that the books are free to distribute could constitute a database rights license to the extent that their collection constitutes a data set. The answer to 1) above is also important here, because for database rights the party that paid for creating the database owns them, and that may be an organisation that paid to digitise something rather than the owner of the originals.

jvdzwaan commented 5 years ago

This issue was discussed during the data sig on 25-07-2019.

Suggestions that stood out: