ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

ROriginStamp - Interface to OriginStamp API to Obtain Trusted Time Stamps #433

Closed rkrug closed 2 years ago

rkrug commented 3 years ago

Submitting Author Name: Rainer M. Krug Submitting Author Github Handle: !--author1-->@rkrug<!--end-author1-- Repository: https://github.com/rkrug/ROriginStamp Version submitted: 0.8.0 Editor: !--editor-->@noamross<!--end-editor-- Reviewers: @cboettig, @cgries

Due date for @cboettig: 2021-04-07 Due date for @cgries: 2021-05-19

Archive: TBD Version accepted: TBD


Package: ROriginStamp
Type: Package
Title: Interface to OriginStamp API to Obtain Trusted Time Stamps
Version: 0.8.0
Authors@R: c(
    person(given = "Rainer M.",
             family = "Krug",
             role = c("aut", "cre"),
             email = "Rainer.Krug@uzh.ch",
             comment = c(ORCID = "0000-0002-7490-0066"))
    )
Description: Functions to get and download trusted timestamps and seed files from OriginStamp <https://originstamp.com>.
URL: https://github.com/rkrug/ROriginStamp
BugReports: https://github.com/rkrug/ROriginStamp/issues
License: GPL (>= 2)
Imports:
        curl,
        jsonlite,
        openssl
Suggests: 
    testthat,
    covr,
    knitr,
    rmarkdown
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

As recommended in the Presubmission inquiry https://github.com/ropensci/software-review/issues/397#issuecomment-703420600

- [ ] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [X] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis

The package automates the process of obtaining trusted timestamps for R objects as well as files.

The package is aimed at a general audience. It can be used to obtain trusted timestamps too prove that any object (e.g. dataset, graphs, documents of any type, etc) were created at or before the time of the time stamping and not modified afterwards. The hashing of the object and querying this hash at the timestamp provider returns the proof / certificate.

Not to my knowledge.

NA

https://github.com/ropensci/software-review/issues/397#issuecomment-703420600 @maelle

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

melvidoni commented 3 years ago

Hello! @noamross will be your Handling Editor.

noamross commented 3 years ago

Greetings @rkrug!

Editor checks:


Editor comments

I am unable to run tests successfully locally. Even when I have set NOT_CRAN=true and add ROriginStamp_api_key to my .Renviron. I think this is likely due to the fact that some of the tests are tied to your own OriginStamp account. In general we want reviewers/users/contributors to be able to run the test suite. There are a few options for this, the best of which is generally refactoring the test suite so that any objects that need to be tested against are created as part of the tests using the users' account. Sometimes testing requires some more elaborate set up like setting up a dummy developers' account. In such cases where manual steps are required we generally request they be described in a CONTRIBUTING.md file.

I also note that the package vignette isn't visisble at https://rkrug.github.io/ROriginStamp/articles/introduction.html, and I would suggest being explicit in the README that the API key environment variable should be ROriginStamp_api_key. I had to find this in the code and we will want this to be simple for the reviewers to get going as easily as possible.

Please make these changes and I can proceed with finding reviewers. Let me know if you have any questions.

Other than that, your package passes goodpractice::gp() tests with flying colors, except for a few long code lines, which reviewers can consider when evaluating code readability but requires no immediate action.

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/create_timestamp.R:7:1
    R/create_timestamp.R:8:1
    R/create_timestamp.R:10:1
    R/create_timestamp.R:13:1
    R/create_timestamp.R:14:1
    ... and 58 more lines
rkrug commented 3 years ago

Thanks for the comments, @noamross.

I found one test, in which the API key was influencing the results. I tried to run all the tests with a different API key, and they succeeded. If they fail for you, could you please let me know which ones fail?

Thanks.

noamross commented 3 years ago

Thanks for the updates. Under the current version (https://github.com/rkrug/ROriginStamp/commit/eebfbbd53db06a4bde16c44c618d880f485d459b), I'm still getting this one error (previously I had more):

> devtools::test()
Loading ROriginStamp
Testing ROriginStamp
✓ |  OK F W S | Context
✓ |   1       | 000_new_OriginTimestampResponse [0.2 s]                                                      
✓ |   2       | 001_get_currencies [0.9 s]                                                                   
✓ |   2       | 002_get_hash_status [1.3 s]                                                                  
✓ |   2       | 003_get_key_usage [0.9 s]                                                                    
✓ |   3       | 004_get_proof [1.2 s]                                                                        
✓ |   2       | 005_options [0.1 s]                                                                          
x |   1 1     | 006_hash [0.2 s]                                                                             
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Failure (test_006_hash.R:8:5): hash
Snapshot of code has changed:
old[1:8] vs new[1:6]
  "Code"
  "  tf <- tempfile()"
- "Code"
  "  write.csv(data.frame(letters, LETTERS, 1:26), tf)"
- "Code"
  "  x <- list(hash = hash(as.hash("
  "    \"2c5d36be542f8f0e7345d77753a5d7ea61a443ba6a9a86bb060332ad56dba38e\")),"
  "  r_object = hash(LETTERS), r_object_def = hash.default(LETTERS), file = suppressMessages("

old[10:27] vs new[8:21]
  "Message <simpleMessage>"
  "  "
  "  x is already a hash - returning x unprocessed"
- "Message <simpleMessage>"
  "  "
  "  Create sha356 hash from R object x"
- "Message <simpleMessage>"
  "  "
  "  Create sha356 hash from R object x"
- "Message <simpleMessage>"
and 8 more ...

Run `snapshot_accept('006_hash')` if this is a deliberate change
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
✓ |   2       | 100_create_timestamp [0.5 s]                                                                 

══ Results ══════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 5.4 s

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 15 ]
rkrug commented 3 years ago

Sorry about that - I somehow missed the new snapshot for the tests. Commit https://github.com/rkrug/ROriginStamp/commit/17d231005a26120b50d293a79ff76791d6bf3fb6 should have fixed it.

noamross commented 3 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 3 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/433_status.svg)](https://github.com/ropensci/software-review/issues/433)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

rkrug commented 3 years ago

Fixed with https://github.com/rkrug/ROriginStamp/commit/b23fa8732d9fcd51553d79e2c2fa100ce5176c8f

noamross commented 3 years ago

@ropensci-review-bot assign @noamross as editor

ropensci-review-bot commented 3 years ago

Assigned! @noamross is now the editor

noamross commented 3 years ago

@ropensci-review-bot add @cboettig to reviewers

ropensci-review-bot commented 3 years ago

@cboettig added to the reviewers list. Review due date is 2021-04-07. Thanks @cboettig for accepting to review! Please refer to our reviewer guide.

cboettig commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

No CONTRIBUTING file or guidelines in README

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 2 h


Review Comments

This is a very nicely implemented package, with a clear coding style, light dependencies, and detailed documentation.

cboettig commented 3 years ago

Just an additional followup comment on the last bullet, suggested to me by a colleague @jhpoelen : why not simply compute the hash and post it to some public record? For example, I could merely tweet my hash, or post it in a GitHub gist (both events have publicly verifiable timelines, and are used as 'proofs' by systems like Keybase.io).

noamross commented 3 years ago

Thanks @cboettig for a review so rapid I haven't found a second reviewer yet! @rkrug, you can want to wait to make changes until we get the other review in.

rkrug commented 3 years ago

@cboettig Thanks for the review.

I will follow Noam's suggestion and get back to it in detail after the report from the second reviewer is in.

Just a quick remark: I agree essentially with most of your points, and especially concerning the TTS for R objects, I would only be too happy to remove it - but we will discuss these later.

noamross commented 3 years ago

@ropensci-review-bot add @cgries to reviewers

ropensci-review-bot commented 3 years ago

@cgries added to the reviewers list. Review due date is 2021-05-19. Thanks @cgries for accepting to review! Please refer to our reviewer guide.

cgries commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 3h


Review Comments

I agree with everything @cboettig already commented on:

devtools::check() returns a warning: checking for unstated dependencies in examples ... OK WARNING 'qpdf' is needed for checks on size reduction of PDFs

devtools::test() mostly fails, I assume, because of local installation path and not accessing my api key?

rkrug commented 3 years ago

Thanks @cgries for your review. Your and @cboettig comment's will definitely improve the package.

@noamross What is the procedure now - will you provide some additional feedback after which I should address the comments, or should I address them right away?

rkrug commented 3 years ago

Just an additional followup comment on the last bullet, suggested to me by a colleague @jhpoelen : why not simply compute the hash and post it to some public record? For example, I could merely tweet my hash, or post it in a GitHub gist (both events have publicly verifiable timelines, and are used as 'proofs' by systems like Keybase.io).

Yes - this could be done. But there are two major differences in using twitter or GitHub:

  1. They both are dependant on a single company and do not use decentralised storage. If twitter shuts down (depending on your view a dream or a nightmare) the timestamps are gone. OK - one can download the tweets, but I do not think that these downloads are tamperproof. The same for GitHub. A blockchain in comparison is decentralised and tamper proof - one can even download the complete blockchain.
  2. Tweets and gists can be deleted, so the timestamp can be removed. One can see this as an advantage, but I personally see a timestamp as existing after creation and nobody can delete it.
cboettig commented 3 years ago

Thanks @rkrug. I do think this is really neat, and the decentralized strategy is really nice. One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?

(and minor note just for completeness -- if GitHub closed it's doors, it's not necessarily true that it's previous timestamps would be forever lost, e.g. since GitHub is archiving much of it's public content in the Arctic Code Vault (pretty unclear how re-trivial would ever work there) but is also archived by independent operations like Software Heritage. For example: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/rkrug/ROriginStamp

rkrug commented 3 years ago

Thanks @rkrug. I do think this is really neat, and the decentralized strategy is really nice. One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?

This is addressed in a new function, which verifies timestamps from OriginStamp without using the origin stamp API - it is in the review-in-progress branch. This works for newer timestamps (beginning of this year) and older ones can be verified as well, but this is a much more involved process. I do not think this is useful too include, as the package ROriginStamp is not yet on CRAN.

(and minor note just for completeness -- if GitHub closed it's doors, it's not necessarily true that it's previous timestamps would be forever lost, e.g. since GitHub is archiving much of it's public content in the Arctic Code Vault (pretty unclear how re-trivial would ever work there) but is also archived by independent operations like Software Heritage. For example: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/rkrug/ROriginStamp

Didn't know that. Thanks. But at least the Software Heritage is an Opt In if I am not mistaken? but good to know.

rkrug commented 3 years ago

@cboettig Just in addition to the standard blockchain, OriginStamp also offers a Media Blockchain:

What is a media blockchain? Newspapers are distributed in the physical world like blockchains are in the digital. OriginStamp publishes the root hash of the Merkle Tree in different newspapers so that all potential newspaper readers become witnesses of the timestamp. Moreover, the root hash of today is a leaf hash for tomorrow, which means that the two Merkle Trees are linked and subsequent newspaper editions form a chain. The security and reliability of a media blockchain relies solely on the number of archived copies of each edition.

Intriguing.

cboettig commented 3 years ago

This is addressed in a new function, which verifies timestamps from OriginStamp without using the origin stamp API - it is in the review-in-progress branch.

:tada: wow, that's really cool! Yeah, I think even just pointing out that this can be done in principle (even better with example code) is a really nice addition. Not at all surprised that it's pretty involved! Did you point to your example, say, in the vignettes?

Software Heritage is an Opt In ...

They took a comprehensive snapshot of public GitHub a few years ago I think, and also snapshot the major software archives regularly. But yeah, getting a timestamp from OriginStamp or anyone is 'opt in'; what I meant to say is that I think for most academic/researcher types, the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc); it's somewhat incidental that the fact that the easiest way to archive some content in SoftwareHeritage is by putting it on GitHub and then triggering the archive API. (Where here "content" could just mean "the hash of the content whose origin timestamp I wish to prove").

Incidentally your work came up in a recent discussion @cgries and I had with folks interested in timestamps proving that a 'forecast' was really made before a given date. Neat stuff!

thhepp commented 3 years ago

Hi, To clarify, I am researcher and co-founder at OriginStamp.

One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?

Feel free to study our paper.

tl;dr OriginStamp uses an algorithmic approach to embed hashes into public blockchains. Approach can be locally reproduced by downloading a proof that can be algorithmically verified.

What is the expected time frame of a valid return from this third-party provider OriginStamp? Are we talking weeks, months, years for a user to get a verification certificate for a certain file?

Bitcoin: every 24hours Ethereum: every hour OAN (open application network, formerly aion): every 10 minutes

... the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc); it's somewhat incidental that the fact that the easiest way to archive some content in SoftwareHeritage is by putting it on GitHub and then triggering the archive API. (Where here "content" could just mean "the hash of the content whose origin timestamp I wish to prove").

I really appreciate your discussions, but how can you timestamp your scientific work without publishing it, e.g. work in progress? There are cases, when research was rejected by a peer-reviewer, who later published these results (idea theft & missing attribution). Moreover, you can document experiments using OriginStamp and when publishing the paper, you should publish this timestamped research log as well. Hence, it significantly increases reproducibility of academic work, because you simply prove its integrity.

... They took a comprehensive snapshot of public GitHub a few years ago I think, and also snapshot the major software archives regularly. But yeah, getting a timestamp from OriginStamp or anyone is 'opt in'; what I meant to say is that I think for most academic/researcher types, the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc);

Besides improved reproducibility, central services can be forced by authorities to censor critical research. I want to share the following link with you: wiki news. From my point of view, the bullet-proof solution for a system for software heritage must be based on a decentralized protocol maintained by the scientific community & institutions to guarantee freedom of research. (maybe my next research project)

But in practice, it seems like this would be very difficult to do, as all package functionality requires an OriginStamp API key. Am I missing something? It seems like in order to make it worthwhile to go through all this trouble to register timestamps in major digital currency blockchains, that it should be possible to verify those timestamps directly from the public block chains without an API key or third party of any kind (even if it is not feasible to submit new timestamps without a key). If so, would it be possible to support that? If not, it kind of feels like "what's the point?" e.g. why not just rely on a trusted authority to gpg sign the hash or something?

Sadly, the operation of such a simple timestamp service requires funding and depends heavily on fluctuating crypto costs. Therefore we had to develop a business that covers these costs. Good news OriginStamp was founded by researchers, and is free to use for researchers like GitHub!

If you have any question, feel free to ask me anything.

cboettig commented 3 years ago

@thhepp great to meet you and thanks for joining the discussion here!

From my point of view, the bullet-proof solution for a system for software heritage must be based on a decentralized protocol maintained by the scientific community & institutions to guarantee freedom of research. (maybe my next research project)

Sounds very cool, I'm right there with you. I'm currently quite interested in the use of content-hashes like in OriginStamp as a means to provide location-agnostic identifiers, inspired by the work of Ben Trask at https://hash-archive.org and others, as a decentralized way to resolve content (does not assume content has only one 'canonical' location and does not rely on an identifier controlled by a single central authority, like the DOI. Obviously the identifier is only a part of the puzzle though.

Sadly, the operation of such a simple timestamp service requires funding and depends heavily on fluctuating crypto costs

Yeah, this is the primary motivation for my reservations. It's great that you've developed a business model to cover these costs and made that free to researchers, but it is a weak point in the chain. If the business model fails, costs to verification independent of OriginStamp become prohibitive. What bugs me though is that this need not be the case, the cost is not inherent to the approach, only to the fact that it piggybacks on a crypto currency blockchain.

how can you timestamp your scientific work without publishing it, e.g. work in progress?

Data archives like Zenodo would allow you to archive ("publish") a list of SHA hashes, much as you do with OriginStamp. You could even "blockchain" such lists by having the subsequent list include the hash identifier of the previous list. While Zenodo (backed by CERN) may not last forever, scientific data archives are probably the most persistent research storage we have, are free and relatively frictionless to access. Just a thought.

Don't mean this to sound like I'm not supportive of the ROriginStamp and OriginStamp projects, I think these are great options to have available to researchers!

rkrug commented 2 years ago

Unfortunately, the verification outside OriginStamp does not work anymore, as the website which I used to get the transaction from does not exist any more (api.smartbit.com.au). As this topic moved out of the focus of our work (we decided that time stamping is not needed anymore) I haven't spend much time on this. My Apologies.

I see from my side two options: The first one is to trim the package down to the parts which are still working (which is essentially as submitted with removed hashing of R objects) and to leave it as a proof of concept and beta package, or to withdraw it from submission.

I will leave the decision up to the editors at rOpenSci, but would prefer the first solution as the time invested by the reviewers would not have been for nothing..

Thanks.

noamross commented 2 years ago

@rkrug. We're going back and picking up some reviews that got lost in ambiguous situations like this. My apologies that this issue has foundered so long! If you are still interested, I think the first solution is fine. I will ask reviewers are amenable to reviewing your changes, and if not, we'll do an editors' review to check that you have addressed them.

rkrug commented 2 years ago

@noamross Thanks and no problem about the delay - it suited me well.

OK - I will look at implementing suggested changes to follow solution one (trim the package to what is still working and to have it effectively as a working but "could be improved" proof of concept) today.

rkrug commented 2 years ago

I just merged my changes in the master branch and made a release v_0.9.0.

As suggested by @noamross, I have prepared the package as a "Proof of Concept". Therefore, I did not address most issues concerning the documentation.

I added stop() statements in the functions which do not work anymore due to a retired website and stated this in the man pages as well. If other sites can be found (and I am sure they are) which provide the same service for the lookup, the functions can be easily re-activated.

I also removed the possibility to hash R objects (including characters) as these are, as pointed out by the reviewers, dependant on the serialisation used by R, which can change over time.

I hope that these changes are sufficient, considering the "Proof of Concept" nature of the package.

Thanks,

Rainer

noamross commented 2 years ago

I think I parsed your response incorrectly @rkrug - I thought you were describing the two options as "trimming down the package" and "leaving it as a proof of concept." While I think a trimmed-down version with fewer functionalities is fine, it should still be fully documented per the reviewers' comments and functional within those trimmed-down bounds, with non-working functions removed or with full deprecation code (the latter not really needed for this pre-CRAN package). This is a bit beyond "proof of concept," because the rOpenSci approval repesents our endorsement as the best-in-class way to perform the task. I think with current limitations the package works well as "best way to use OriginStamp API from R", even if the documentation also covers how it fits into the rest of the TTS ecosystem that it may not directly interface with.

rkrug commented 2 years ago

Thanks a lot @noamross. As stated above, I can't afford to spend much more time on this package, which I will likely not use I. the future and consequently that I will not put much effort into maintaining it

I perfectly understand your reasoning, so please feel free to mark the package as withdrawn unless there is interest from your side in tidying it up and not too much work involved on my side.

Thanks and sorry for the work you, and @cboettig and @cgries have put into this.

@thhepp Even if the package is withdrawn here, I will keep it on GitHub and likely on Runiverse.

jhpoelen commented 2 years ago

I appreciate the notes of @rkrug related to effort required to keep packages up and running, and that another time might be better to consider this package for review. In my mind, this exchange shows that the review process works nicely. And . . . that reviewing, just like maintaining packages, takes time.

Thanks @cboettig and @cgries for reviewing.

emilyriederer commented 2 years ago

Thanks so all for this discussion. As I'm doing a round of clean-up on outstanding issues, I'm going to go ahead and close this issue.

rkrug commented 2 years ago

Thanks @emilyriederer