Look into getting DOI for Programming Historian releases

wcaleb commented 9 years ago

https://guides.github.com/activities/citable-code/

acrymble commented 9 years ago

we havn't quite sorted out this notion of 'releases' yet.

@fredgibbs has urged us to go with the 'journal article' citation format. Should we adopt a 'Volume' style of publishing, but in a hybrid fashion that lets people publish in an ongoing capacity until a date threshold pushes us into a new volume? Like: September 1 = new volume year?

fredgibbs commented 9 years ago

generally i’m not in favor of trying to look like a traditional journal because i don’t think that increases how or why people would take our work seriously. more importantly, i really like that we’re forging a new way of sustaining an ongoing digital publication rather than trying to emulate the editorial or publication conventions of physical printing practices. my vote would be to eschew the volume/issue information, and provide a publication date in the metadata. to me, that’s the most honest approach as well, and captures the key information for citations.

--

frederick w gibbs | assistant professor of history | univ. of new mexico

fredgibbs.net | @fredgibbs

On Sun, May 3, 2015 at 12:02 PM, Adam Crymble notifications@github.com wrote:

we havn't quite sorted out this notion of 'releases' yet.

@fredgibbs has urged us to go with the 'journal article' citation format. Should we adopt a 'Volume' style of publishing, but in a hybrid fashion that lets people publish in an ongoing capacity until a date threshold pushes us into a new volume? Like: September 1 = new volume year?

Reply to this email directly or view it on GitHub: https://github.com/programminghistorian/jekyll/issues/64#issuecomment-98518023

wcaleb commented 9 years ago

By hosting on GitHub we are capable of providing a fine-grained list of versions of every individual article, which is far more information-rich than volumes and issues. Maybe one way to make that clearer to the user would be to rename the "Track Changes" link in the footer of each page to "Previous Versions" (which is essentially what that link leads to if you try it out). Then, if someone wants to capture the version being used in their citation, they can include "accessed on " such and such date, and anyone who wants to look at that version can presumably get to the dated version by following the "Previous Versions" link.

wcaleb commented 9 years ago

I should add that what I suggest above is not incompatible with also occasionally doing a "release" (like on a certain day every year) that gets archived in Zenodo and receives a DOI. I don't see a downside to doing that, though the upside is also unclear to me.

acrymble commented 9 years ago

I like the term 'version' better than 'volume' for this type of work. Versioning is helpful for legacy footnotes. It lets people seek out the lessons as they were during version X on the Internet Archive, for example, without having to worry that there's no snapshot for 4 June 2014

wcaleb commented 9 years ago

It may be possible, though not immediately obvious to me how, to put a line at bottom of each lesson saying "Last revised on TIMESTAMP."

But the main decision to make is whether we want to "release" the entire site at a particular date and put a version number on it, or just rely on Git's built in version control to expose the revision history of each individual lesson.

I think I favor the latter, for a couple of reasons:

"Releasing" a canonical "version" would probably give that particular snapshot some special weight in the user's mind, when in fact it would just be an arbitrary snapshot, unless we put in a lot of work to fix all errors and typos in advance of a major "release," which I don't think we want to do on a regular basis.
For advanced users familiar with semantic versioning, the version numbers could imply more than we intend.
Say we released the whole site as "Version 1.0." Would we put that somewhere on the live site? If not, then finding the "release" would be no less difficult than just looking at the GitHub revision history for an individual lesson (that is, it would require about an equal amount of familiarity with GitHub, git, and software versioning principles). If we did put it on the site somewhere, it would in some sense always be partially out of date as we continued to update lessons.

If we end up creating an eBook or PDF of the site periodically as per #65, that might give us some of the virtues that releasing a version of the entire site would, but without a full blown version release routine or numbering system.

For the time being, I can certainly change the footer to say "Version history" instead of "Track changes."

acrymble commented 9 years ago

we may not use Github forever, so our decisions should be independent of this one particular service's options.

wcaleb commented 9 years ago

The git history is not dependent on GitHub, but I see your point.

For sake of clarity, could you spell out a proposal for what you'd like to see? Your comment about allowing "people [to] seek out the lessons as they were during version X" was helpful for me to begin seeing what you have in mind, but I'm not clear on what that would actually look like on the site. Are you mainly just thinking it would be good to periodically "release" the repo to Zenodo and attach a DOI, as described in the link I originally posted? Or do you have something else in view?

acrymble commented 9 years ago

My concern is for long-term citability and related to that, sustainability. I think we should be planning for the possibility that someone will want to cite lesson X twenty years from now, and we should be putting in place a way for them to cite exactly what they had read. The lessons are updated from time to time, so DOIs do not really work. The github version number does work assuming those will be long-term accessible (as in 20+ years).

I also think we need to give people that ability for citing the project (The Programming Historian), which is distinct from the individual lessons. Versions would work for this, I think, because the project goes through fairly major iterations (new people join, some people leave, new technologies and designs are applied). And the big problem with 'accessed on 1 June 2015' is that no one can check what it looked like on that date unless the Internet Archive happened to take a snapshot at that moment.

So maybe this goes into a bigger discussion about sustainability. How can we leverage the flexibility of the changeable web, with the permanence of the 'version of record' that is really important for scholarship whether we like it or not?

drjwbaker commented 9 years ago

For info, PLOS handle versioning through CrossMark https://www.plos.org/version-tracking-plos-participates-in-the-crossmark-program/

Reading what you say above Adam, Zenodo minted DOIs for each Git version should provide the cite what you saw functionality scholars need whilst not denying PH exploiting the flex 'the flexibility of the changeable web' to change lessons as and when required.

acrymble commented 9 years ago

Is that difficult to implement? Should we start our own DOIs?

drjwbaker commented 9 years ago

CrossMark, no idea.

Not sure what you mean by 'start our own DOIs' but might be worth having a chat with @MartinPaulEve re how OLH are handling iteration, identifiers et al.

MartinPaulEve commented 9 years ago

Hi all,

Brief response:

1.) You can mint DOIs for each version that is released, which should get around the problem of not seeing what they read, so long as Zenodo's preservation strategy holds. 2.) I'm not sure how compatible CrossMark will be with Zenodo (DataCite) DOIs. 3.) I'd recommend dropping an email to Geoff Bilder at CrossRef with a request for advice on the right approach and whether CrossMark is appropriate.

Best wishes,

Martin

acrymble commented 9 years ago

I had a look at CrossMark and I think that's more for a site that's likely to find its content all over the internet with possibly outdated versions appearing. That's not really a problem for us.

I think our easiest solution here is to put a full set of the files as they were on Zenodo on 1 September each year, with a note in a README file that it's not a 'canonical version' as such, but a snapshot we made for anyone who needs it.

The 1 September date should capture most of the summer activity, and gives our writers impetus to get their summer projects in. I'm happy to take that on. It's not a perfect solution, but it does give us a little bit of extra security and sustainability. This needn't be a PDF as in #65 . It could just be the raw files zipped or tarred together.

Anyone unhappy with that?

MartinPaulEve commented 9 years ago

Just to add that CrossMark is not just for distributed content - it's also for versioning.

M

— Sent from Mailbox

On Wed, Jun 24, 2015 at 5:39 PM, Adam Crymble notifications@github.com wrote:

I had a look at CrossMark and I think that's more for a site that's likely to find its content all over the internet with possibly outdated versions appearing. That's not really a problem for us. I think our easiest solution here is to put a full set of the files as they were on Zenodo on 1 September each year, with a note in a README file that it's not a 'canonical version' as such, but a snapshot we made for anyone who needs it. The 1 September date should capture most of the summer activity, and gives our writers impetus to get their summer projects in. I'm happy to take that on. It's not a perfect solution, but it does give us a little bit of extra security and sustainability. This needn't be a PDF as in #65 . It could just be the raw files zipped or tarred together.

Anyone unhappy with that?

Reply to this email directly or view it on GitHub: https://github.com/programminghistorian/jekyll/issues/64#issuecomment-114937146

drjwbaker commented 9 years ago

Zenodo seems sensible to me if the purpose is to make citation of PH as a whole (rather than each article) sustainable. That'll be some co-author list!

wcaleb commented 9 years ago

I like the once-a-year Zenodo idea.

acrymble commented 9 years ago

I have archived a copy of the website files with Zenodo at the following DOI: https://zenodo.org/record/30935

This archive was taken on 12 September 2015.

acrymble commented 7 years ago

I attended an event on Persistent Identifiers yesterday at the British Library put on by the THOR project (https://project-thor.eu/). I spoke to Adam Farquhar (head of BL digital scholarship) about our needs, and he's provided some advice on DOIs for our lessons.

In particular he suggested we use an existing service (Figshare, Zenodo) to create and maintain the DOIs, and that each lesson and each subsequent version (after any updates at all) should have its own unique DOI.

Mimi Keshani from Figshare was at the event so I've approached her about whether or not Figshare can provide us with a zero-maintenance solution that would automatically mint new DOIs each time any file in our https://github.com/programminghistorian/jekyll/tree/gh-pages/lessons directory was updated, and to automatically hook that DOI back to our site. I'll report back when I hear from her.

acrymble commented 7 years ago

If I can get this sorted out I'm going to revisit the XML upload option for the Open Library of the Humanities (#180). This would require piping our new releases to their XML format. But it requests DOIs so that's step one.

acrymble commented 7 years ago

Just spoke to Figshare team. They don't have an automatic way for our files in /lessons to automatically be minted with DOIs every time a file is updated. But they did have a suggestion that I think will work.

1) When a new publication is accepted and uploaded, the editor creates a new entry in Figshare using the 'Linked File' option (details of how to do this would need to be added to editorial workflow). 2) Rather than update new DOIs for new versions (minor updates to a lesson), we just make the history of the lesson clearer via a link in the lesson metadata (eg: https://github.com/programminghistorian/jekyll/commits/gh-pages/lessons/OCR-with-Tesseract-and-ScanTailor.md).

-- The DOI admittedly isn't important for everyone, but it does serve some current needs in UK higher education. In particular:

The need to constantly add outputs to institutional repositories. DOIs often mean you can slurp this detail in automatically and save time.
It's also easier for me to maintain the entries in the Open Library of the Humanities, because again the metadata only gets added once and the DOI is used to populate this data (#180).
It's increasingly seen as 'normal' in online publications. Most journals now do this with articles.

I can add these retroactively for current lessons. Can anyone let me know of any issues they see. I suspect the metadata we'd put into figshare/the repository could largely be cut and pasted from a template, so shouldn't be much extra editorial effort.

rgieseke commented 7 years ago

Just had this post by @mfenner of DataCite in my feed:

And starting today all blog posts on this blog will have a DOI, metadata and use a persistent storage mechanism. [...] The DOIs for this blog are generated automatically, using a modified base32 encoding algorithm that is provided by Cirneco, as discussed last week (Fenner, 2016). The DOI is generated and minted when a new post is pushed to https://blog.datacite.org. This avoids two problems: a) DOI-like strings in the wild before publication and b) the randomly generated DOI exists already (we can simply generate a new one). All DOIs are short, without semantic infomation that might change over time, and with a checksum to minimize transcription errors, for example https://doi.org/10.5438/XCBJ-G7ZY

https://blog.datacite.org/eating-your-own-dog-food/

I don't know whether Programming Historian could become a DataCite member and/or apply something similar but the approach seems quite interesting (also having short doi links seems useful).

mfenner commented 7 years ago

@rgieseke The solution for minting DOIs that the DataCite blog has implemented should also work for jekyll, as the workflow is pretty generic (and the DataCite blog used Jekyll until earlier this week). For smaller organizations it might be easier to work with a DataCite member, if you want to mint DOIs. And the general approach should also work with Crossref DOIs, but you have to adapt to Crossref metadata.

acrymble commented 7 years ago

Thanks @mfenner and @rgieseke. That looks like just what we're after.

We aren't a DataCite member though, which poses a username/password problem for us. More generally, the description is clear enough of what you're doing, but I'm afraid it's over my head how to actually use it. I'm not a Ruby person. Do you have plans to flush out the instructions?

gvwilson commented 7 years ago

This post by Matt Turk has some interesting ideas as well.

mfenner commented 7 years ago

@acrymble I am certainly planning to polish the DataCite implementation and better document the process. This will take at least a few weeks.

I would work with one of the institutions you are associated with to be able to mint DataCite DOIs. A good number of institutions in the UK are for example working with the British Library for this.

arojascastro commented 7 years ago

I don't know if this issue is still alive but have a look at what sx-archipelagos achieved. They are publishing articles with DOI and using GitHub as a publication platform as well. Maybe we can ask Alex? https://github.com/sx-archipelagos/sxa

mdlincoln commented 7 years ago

This is now superseded by #595

programminghistorian / jekyll

Look into getting DOI for Programming Historian releases #64

@fredgibbs has urged us to go with the 'journal article' citation format. Should we adopt a 'Volume' style of publishing, but in a hybrid fashion that lets people publish in an ongoing capacity until a date threshold pushes us into a new volume? Like: September 1 = new volume year?

Anyone unhappy with that?