samtools / hts-specs

Specifications of SAM/BAM and related high-throughput sequencing file formats
http://samtools.github.io/hts-specs/
652 stars 174 forks source link

Using example from vcf specification as test file may raise copyright concerns #518

Open rhpvorderman opened 4 years ago

rhpvorderman commented 4 years ago

Dear hts-specs maintainers,

First, thank you for all the great work! Second, I ran into a problem when using one of the vcf examples as a test file. I am currently working on the biowdl scatter tool which chops up big genomes in smaller regions. It outputs bed files which can be used by tools (such as GATK) to each handle a smaller part of the genome.

I want to add vcf support to this tool. In order to test that I need a correct vcf example and the example from the specification is perfect for this case. It contains both SNPs and indels.

Unfortunately there is not a copyright notice anywhere. Which means the default copyright will apply. I believe this is 70 years of copyright.

Might I propose that I copyright notice is placed with the document? Preferably one that makes all the examples available to the public domain? That would be really great when simple test files are needed.

d-cameron commented 4 years ago

The question of the most appropriate license will need to be resolved.

MIT since it covers both software and documentation? MIT+CC-BY-SA? Other options?

d-cameron commented 4 years ago

This issue should be resolved before we start pushing SAM and VCF files to the official suite of test case/examples.

rhpvorderman commented 4 years ago

The WDL spec is licensed under a BSD-3-clause license. Since the spec files are written in .tex I think a similar license such as MIT is appropriate.

For the examples CC-0 (the unlicense) is great. That will allow people to use the work freely without crediting anyone. This is really useful. Any application regardless of license will be able to use the examples for test files.

jmarshall commented 4 years ago

This repository contains numerous specification texts with (historically semi-intentionally) no clear copyright status, currently no software (other than infrastructure scripts), and example files in the various formats either embedded and lightly-formatted in the texts or (soon) as separate verbatim files.

For the example files which the OP asks about, to the extent that they are simple examples not containing meaningful PII/genome-derived data, IMHO the simplest thing possible would be appropriate. That would be a public domain dedication or CC0, or perhaps a case could be made for CC-BY-SA or similar.

jmarshall commented 4 years ago

(Appropriate licensing conditions for the specification texts themselves is something that is already under consideration within GA4GH. So let's keep this issue about only the examples and example files.)