nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
238 stars 188 forks source link

Automation for Zenodo DOI #365

Open ewels opened 5 years ago

ewels commented 5 years ago

Zenodo DOIs are an excellent way to cite nf-core pipelines, especially as they give a specific DOI per version of the pipeline. However, there are two points with the current setup which are quite annoying:

  1. We (one of the nf-core admins) has to manually set up the automated GitHub link for each new pipeline
  2. DOIs are given after a release. This means that the master branch then has to be updated to show the badge for the new DOI after the release is pushed. This changes the commit hash on master so that it no longer matches the release.
    • This is very slightly bad practice as we're no longer exactly the same as the release. But worse, it messes up functionality in nf-core list and elsewhere, which checks commit hashes of local clones to see if the latest release is being run.
    • Also bad - if people properly run the release (with the -r nextflow flag or by manually downloading), the bundled code cannot include any information about the proper DOI for citation. This will become more of an issue as we try to improve the ease of access to this information (see #361)

After a very, very quick skim read of the docs, I think that we should be able to solve both of these problems with what seems to be an excellent Zenodo API. I see two approaches:

Approach 1: Fully automate releases

The downside is this has to be done before the release. This means that we can't use the GitHub release web interface, but instead have to trigger the release programatically somehow. This probably needs a little though as to how to do it nicely. Also, whether it's worth it!

Approach 2: More manual DOI fetching, with lint checks

An alternative to this is that we can go fully the other way, and instead of using the automated linkage, manually pre-reserve the DOI on the Zenodo website before release. This would have to be done by the pipeline authors. We could potentially then get the lint tests to check for this when running with the --release flag to ensure that it happens properly.

Welcome for thoughts and feedback!

Phil

sven1103 commented 5 years ago

Just thinking wildy here: why not just providing the top-level DOI that automatically routes to the latest Zenodo DOI version of a project and avoid the hassle?

People can get the correct version DOI from the DOI authority easily, which is Zenodo in our case.

Hit me :D

maxulysse commented 5 years ago

You mean one like that: https://zenodo.org/badge/latestdoi/54024046 That's the one you can get with the first release with Zenodo. But there's way to reserve a DOI so it should be usable before and release with it: cf docs: https://help.zenodo.org/

Yes you can! On the upload page under Basic Information and Digital Object Identifier click the Reserve DOI button. The text field above will display the DOI that your record will have once it is published. This will not register the DOI yet, nor will it publish your record (so you can still update the files). This DOI can be safely used in the record's own content as well as any other separate datasets or papers you might be planning to publish.

sven1103 commented 5 years ago

You mean one like that: https://zenodo.org/badge/latestdoi/54024046 That's the one you can get with the first release with Zenodo.

Exactly.

But there's way to reserve a DOI so it should be usable before and release with it: cf docs: https://help.zenodo.org/

Sure, just raising up the question if adding another level of complexity is really necessary :)

ewels commented 5 years ago

Yes this would definitely be easier, but we just made a bit song and dance in the manuscript about how every release gets its own DOI šŸ˜… I guess with the general one, each release would still get its own release-specific DOI, but it's just a little trickier for people to find it. If it's explicitly in the repo then it can be saved with the results in the upcoming citation file, which I like.

sven1103 commented 5 years ago

but we just made a bit song and dance in the manuscript about how every release gets its own DOI

And this will be conserved. Zenodo will always create DOIs for every release. So we are still authentic ;)

but it's just a little trickier for people to find it

Ah well, you click on the link and choose the DOI from the version you used from the right panel in the webpage. The benefit is very little compared to the implementation hassle imho.

If it's explicitly in the repo then it can be saved with the results in the upcoming citation file, which I like.

ok, this is a point for which I don't have a solution yet.

Or we just reserve a DOI everytime we merge to master, but don't publish (DOI does not get live). When the real GitHub release comes, we use this DOI and update the record content via the Zendodo API and finally trigger the publishing via the API as well. This might work.

apeltzer commented 5 years ago

We really need this - it feels wrong to have to do this manually after doing a release and then manually adding it to the README when doing the very first release šŸ˜“

sven1103 commented 5 years ago

@apeltzer I agree, lets push this please first: https://github.com/nf-core/tools/issues/319 and agree on a common formal description of the release process. Then lets translate it into GitHub actions. I am happy to write the script to do the Zenodo interaction, I love such stuff.

apeltzer commented 5 years ago

Agree that we should have this with #319 - although that enforces proper Git Commits everywhere too (though there are plugins for that for Atom / Code / IntelliJ to do that, e.g.: https://github.com/KnisterPeter/vscode-commitizen)

ewels commented 5 years ago

although that enforces proper Git Commits everywhere too

I don't follow.. how come?

ewels commented 5 years ago

Re-reading this now, I wonder if we are causing ourselves trouble and overcomplicating things massively here... Maybe we should just have the general DOI for the pipeline? Then if we develop the separate nf-core cite command, that can always pull the pipeline-specific DOI.

It certainly would be a hell of a lot easier.. šŸ˜° (and less likely to cause problems)

apeltzer commented 5 years ago

Given how many projects are hitting me at the moment, I tend to agree. Maybe start small first and then make it bigger afterward?

ewels commented 5 years ago

Ok, so let's shelve this and #319 for now then if everyone is happy for that. And let's just start using the base Zenodo DOI everywhere. I guess we should document that somewhere...

@sven1103 are you happy with this? I know that you were getting excited about the automation šŸ˜…

sven1103 commented 5 years ago

I suggested the base DOI in the first place (https://github.com/nf-core/tools/issues/365#issuecomment-522602862), so of course I am happy with it šŸ˜‚

sven1103 commented 5 years ago

But thank you @ewels for appreciating my excitement about the implementation :P

mribeirodantas commented 2 years ago

Maybe this?

https://github.com/gbif/gbif-doi https://github.com/gbif/datacite-rest-client

Check "Create an identifier in Draft state" in https://support.datacite.org/docs/api-create-dois

"To reserve an identifier in Draft state, you will need to ..."

jfy133 commented 2 years ago

GitHub Actions: https://github.com/ivotron/zenodo/

ewels commented 2 years ago

GitHub Actions: https://github.com/ivotron/zenodo/

Looks nice but hasn't been updated in 3 years, which is forever with GitHub Actions. I don't recognise the syntax of the example at all... šŸ‘€ Also it doesn't show up in the GitHub Marketplace for actions, so pretty sure it won't work.

There are a few that do though: https://github.com/marketplace?type=actions&query=zenodo

apeltzer commented 2 years ago

I will check a bit more to figure out what might be most suitable for what we actually want to have...

FriederikeHanssen commented 1 month ago

Just talking with @maxulysse about this, after the Zenodo ID issue (again) from this morning. We used the Zenodo API here directly: https://github.com/nf-core/sarek/blob/master/.github/workflows/upload.py which is working pretty well so far.

We are not trying to reserve a pipeline ID, but just publishing files. If someone has time, maybe yet another angle to investigate if it's worthwhile.