Closed peterdesmet closed 2 years ago
@ben-norton @kbubnicki others, thoughts?
@peterdesmet @kbubnicki
The Data Package documentation allows for an array called licenses that must contain a name and/or URL with an optional title. These are insufficient for a dual-licensing model, where one license applies to the metadata and the other applies to media. Fortunately, the licenses property is an array, which can accommodate multiple licenses. The only missing piece is the scope with a controlled vocabulary (metadata, media). If the scope field can be added to objects in the licensing array, then the problem is solved. It might look like the following:
"licenses": [{
"scope": "media"
"name": "ODC-PDDL-1.0",
"path": "http://opendatacommons.org/licenses/pddl/",
"title": "Open Data Commons Public Domain Dedication and License v1.0",
},
{
"scope": "metadata"
"name": "CC0 1.0",
"path": "https://creativecommons.org/publicdomain/zero/1.0/",
"title": "CC0 1.0 Universal Public Domain Dedication",
}
]
In terms of business rules, I suggest the following: 2 Options. Option 1.
"licenses": [{
"scope": "media"
"name": "ODC-PDDL-1.0",
"path": "http://opendatacommons.org/licenses/pddl/",
"title": "Open Data Commons Public Domain Dedication and License v1.0",
}],
Since a second license object was not provided for the metadata, the object above applies to both, even though the scope is listed as media.
I would like to weigh in, mixed license datasets are an annoying reality that seriously hinders reuse, and should be avoided as much as possible. However, while this is currently not the case for our camera trap images (CC-0, data is CC-BY-SA). I can certainly imagine deployments where a partner would like their images included in our dataset (and identified in our pipeline) but would not be able to give a license waiver like CC-0.
So, while I think it's unlikely to be a big problem for single deployment datasets, I think it will be when it comes to exchanging multi year/deployment datasets that had multiple collaborators, and perhaps different iterations of data management planning.
So, I would suggest a solution similar to DwC-A, having a license per record, and allowing (but discouraging) mixed licenses in a comment. Aligning with the Multimedia extension in this aspect is also user friendly, and might perhaps avoid some confusion.
@ben-norton I like that suggestion! I'd make some minor changes to the business rules you list: I'll see how that can be implemented.
@PietrH I understand your concern, which is why I originally suggested a license per image. However:
when it comes to exchanging multi year/deployment datasets that had multiple collaborators
I think that is something they could reasonably agree upon. The scope of a Camtrap DP dataset is a study, so there is generally a single person (e.g. PI) that can make final decisions.
Aligning with the Multimedia extension in this aspect is also user friendly, and might perhaps avoid some confusion.
A translation to Darwin Core or the Multimedia extension will have to take the properties of the datapackage.json
profile into account anyway (e.g. for dataset name), so it's quite easy to assign the single media license to every image when transforming to DwC.
I asked the Frictionless Community regarding this approach on their Discord. This use case hasn't been encountered before, but the approach suggested by @ben-norton sounds reasonable. Here's a copy/paste of that discussion:
Hi all, for our frictionless camera trap data, we want data publishers to be able to indicate the license of the CSV data and the license of the image files referenced in media.csv
. A datapackage license
allows multiple licenses https://specs.frictionlessdata.io/data-package/#licenses We would like to build upon that to indicate scope:
"licenses": [{
"name": "CC0-1.0",
"scope": "data" <- License applies to the data in the package
},
{
"name": "CC-BY-4.0",
"scope": "media" <- License applies to the referenced media files
}]
Is that a good approach? Has anyone else encountered a similar use case? Suggestions? Note that a resource can have its own license property, but still applies to the CSV data itself, not the referenced images.
Hi @peterdesmet . Interesting question.If I understand correctly, the referenced media files are just included as links to another URL, and the media files themselves are not included in the data package and are hosted elsewhere, right? In that case, wouldn't it be the responsibility of the server which serves the media files to declare the license, instead of the data package that merely links to it?
That is correct, although the image files could be included as part of a data package, but Camtrap DP doesn't make any assumptions regarding that.
It could indeed be seen as the responsibility of the server, but a) the URLs hotlink the images themselves (easier to consume), so it would have to be embedded in the exif metadata, b) having to assess the license per image is a burden to the user, and c) many servers might not provide that functionality. It would therefore still be useful if the data producer can indicate that at package level.
Well, in that case, each link to each image could also be included as a resource in the data package, and their specific licenses could be indicated at the resource level. Would that be a good solution for your use case?
Thanks for the suggestion, but that would seriously bloat the package: some contain over 1 million+ images. It would also be difficult to consume, since every resource would need its own unique name, which a user doesn't necessarily know.
A more straightforward solution would be to indicate the license for every record in the media.csv, which was my initial suggestion in an issue discussing this https://github.com/tdwg/camtrap-dp/issues/189, but in reality, all images within a package are very likely to have the same license.
Someone suggested to use the license array, which looks like an elegant approach, but wanted to check here if that solution make sense. 🙂
Well, it does make sense, except that
I like the license array approach, but I don't know of other use cases. I agree with you that it is better to include the license info for the images in the DP as opposed to leaving it the the server's responsibility. ("having to assess the license per image is a burden to the user" --> this is a real problem for research data, so I totally agree with you here)
Thanks, I think we will use that approach then, and add a property scope to the license. Although no software would currently be able to derive meaning from that, users reasonably might. And software would still be aware that two licenses are at play.
Good discussion, @peterdesmet!
If you're going to use the non-standard "scope" attribute I would suggest that document what it means, what are the allowed values and how does one determine to which resources they apply, etc.
@augusto-herrmann exactly! 👌
This may solve the problem for camtrap-dp
, but if you are going to make the scope
attribute able to generalize to any data package, you realize that this is probably going to need to change, right? And that these changes might not be compatible with the current solution. I.e., specify which media types are selected by the scope when the "media" value is used.
I might even suggest using "linked_media", because the media is not even included in the data package, just linked there.
attribute able to generalize to any data package, you realize that this is probably going to need to change, right?
Yes, I might do a PR to the frictionless specs to support that.
Regarding linked_media
: that is a good suggestion, might write it as linked media
to be consistent with other vocabs we have.
I think Camtrap DP should allow publishers to indicate the license of the images. It can be different from the license of the data. In general, Camtrap DP doesn't make many assumptions about the images (size, whether they are accessible, etc.), but most of those properties can be derived (by machines) by following the path or URL. That is not the case with the license, which is why I think it would be good to have it as a term in
media.csv
.This issue is not about what license(s) should be applied to media files, only that it should be possible to indicate it.
Field properties
https://creativecommons.org/publicdomain/zero/1.0/
. Note that thelicense
property in Data Package requires aname
and/orpath
property. Thename
must be the name of one of the 100 licenses at https://opendefinition.org/licenses/api/#all-licenses. For our field, we could also opt to use license name rather than license URL. That would offer more control that the provided value is actually a license, but it offers less freedom (could be a good thing) and it's less consumable by users/machines than a URL.rightsHolder
(see further).Url
toURL
in https://github.com/tdwg/camtrap-dp/blob/293ec5dcbf27f7605138a1585272229c06b3bd68/media-table-schema.json#L67Proposal
Size increase
This proposal would add a
license
value to every record in themedia.csv
, which increases file size. However:media.csv
resource is not a good alternative, as that one applies to the csv data, not the linked media filesCredit
Ideally, we should also indicate how the image should be credited (if required by the license). Adding that information for every image might be overkill, so I suggest that the definition refer to the rightsHolder. That definition could be extended to
... owning or managing rights over this data package and associated media files.
https://github.com/tdwg/camtrap-dp/blob/293ec5dcbf27f7605138a1585272229c06b3bd68/camtrap-dp-profile.json#L80-L84