trustoverip / concepts-and-terminology-wg

Content and code repository for the Concepts & Terminology Working Group
https://trustoverip.github.io/concepts-and-terminology-wg/
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

need a convention for attribution #54

Open dhh1128 opened 3 years ago

dhh1128 commented 3 years ago

In order to satisfy licensing provision on content that we get from external sources (see #47), we need a convention that allows us to attribute content in our terminology corpus to other authors that may have used licenses or copyrights.

Terms themselves cannot be copyrighted (that's what trademarks are for), so we don't need to attribute terms themselves. And we're not going to find a full wiki page of content that exactly matches our term template, in any place besides a term wiki. So we don't need to attribute full wiki pages, either.

What can be copyrighted is definitions, example paragraphs that illustrate a term being used in context, and so forth.

I suggest the following convention to address this need:

Anywhere that we want to attribute content to some other place, we embed a block of text in this form: [from [source name](source_uri), [license name](license_uri)]. This would give an attribution that reads like this, for a human:

Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body. MRI scanners use strong magnetic fields, magnetic field gradients, and radio waves to generate images of the organs in the body. MRI does not involve X-rays or the use of ionizing radiation, which distinguishes it from CT and PET scans. MRI is a medical application of nuclear magnetic resonance (NMR) which can also be used for imaging in other NMR applications, such as NMR spectroscopy. [from Wikipedia, CC-SA 3.0]

And the underlying markdown would look like this:

...such as NMR spectroscopy. [from [Wikipedia](
  https://en.wikipedia.org/wiki/Magnetic_resonance_imaging), 
[CC-SA 3.0](
  https://j.mp/3zMXCTi)]

We can then write tooling that looks for the regex \[from \[.*?\]\(.*?\), \[.*?\]\(.*?\). Anywhere we detect that pattern, I suggest we attribute everything from that regex match backward until we hit the beginning of the paragraph or list item, or until we run into a previous attribution, as the content that's being attributed. This more or less matches the precision of citations in academic papers, is easy for humans to intuit correctly, and allows us to attribute more than one source if needed. It does not, however, allow us to attribute with ultra precision.

What do you think?