ROriginStamp - Interface to OriginStamp API to Obtain Trusted Time Stamps

rkrug commented 3 years ago

Submitting Author Name: Rainer M. Krug Submitting Author Github Handle: !--author1-->@rkrug@noamross<!--end-editor-- Reviewers: @cboettig, @cgries

Due date for @cboettig: 2021-04-07 Due date for @cgries: 2021-05-19

Archive: TBD Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Package: ROriginStamp
Type: Package
Title: Interface to OriginStamp API to Obtain Trusted Time Stamps
Version: 0.8.0
Authors@R: c(
    person(given = "Rainer M.",
             family = "Krug",
             role = c("aut", "cre"),
             email = "Rainer.Krug@uzh.ch",
             comment = c(ORCID = "0000-0002-7490-0066"))
    )
Description: Functions to get and download trusted timestamps and seed files from OriginStamp <https://originstamp.com>.
URL: https://github.com/rkrug/ROriginStamp
BugReports: https://github.com/rkrug/ROriginStamp/issues
License: GPL (>= 2)
Imports:
        curl,
        jsonlite,
        openssl
Suggests: 
    testthat,
    covr,
    knitr,
    rmarkdown
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

As recommended in the Presubmission inquiry https://github.com/ropensci/software-review/issues/397#issuecomment-703420600

- [ ] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [X] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis

Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package automates the process of obtaining trusted timestamps for R objects as well as files.

Who is the target audience and what are scientific applications of this package?

The package is aimed at a general audience. It can be used to obtain trusted timestamps too prove that any object (e.g. dataset, graphs, documents of any type, etc) were created at or before the time of the time stamping and not modified afterwards. The hashing of the object and querying this hash at the timestamp provider returns the proof / certificate.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Not to my knowledge.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

NA

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

https://github.com/ropensci/software-review/issues/397#issuecomment-703420600 @maelle

Technical checks

Confirm each of the following by checking the box.

[X] I have read the guide for authors and rOpenSci packaging guide.

This package:

[X] does not violate the Terms of Service of any service it interacts with.
[X] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[X] includes documentation with examples for all functions, created with roxygen2.
[X] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[X] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[X] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

melvidoni commented 3 years ago

Hello! @noamross will be your Handling Editor.

noamross commented 3 years ago

Greetings @rkrug!

Editor checks:

[x] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package.
[x] Fit: The package meets criteria for fit and overlap
[x] Automated tests: Package has a testing suite and is tested via a CI service.
[x] License: The package has a CRAN or OSI accepted license
[x] Repository: The repository link resolves correctly

Editor comments

I am unable to run tests successfully locally. Even when I have set NOT_CRAN=true and add ROriginStamp_api_key to my .Renviron. I think this is likely due to the fact that some of the tests are tied to your own OriginStamp account. In general we want reviewers/users/contributors to be able to run the test suite. There are a few options for this, the best of which is generally refactoring the test suite so that any objects that need to be tested against are created as part of the tests using the users' account. Sometimes testing requires some more elaborate set up like setting up a dummy developers' account. In such cases where manual steps are required we generally request they be described in a CONTRIBUTING.md file.

I also note that the package vignette isn't visisble at https://rkrug.github.io/ROriginStamp/articles/introduction.html, and I would suggest being explicit in the README that the API key environment variable should be ROriginStamp_api_key. I had to find this in the code and we will want this to be simple for the reviewers to get going as easily as possible.

Please make these changes and I can proceed with finding reviewers. Let me know if you have any questions.

Other than that, your package passes goodpractice::gp() tests with flying colors, except for a few long code lines, which reviewers can consider when evaluating code readability but requires no immediate action.

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/create_timestamp.R:7:1
    R/create_timestamp.R:8:1
    R/create_timestamp.R:10:1
    R/create_timestamp.R:13:1
    R/create_timestamp.R:14:1
    ... and 58 more lines

rkrug commented 3 years ago

Thanks for the comments, @noamross.

[X] Vignette visibility https://github.com/rkrug/ROriginStamp/commit/f90fb81b0dc511997b866ec3e20975d9b19e19c7 and https://github.com/rkrug/ROriginStamp/commit/0378f5787bb6b7c13824ac4dd6ed5b4f6988275c
[X] Added ROpenSci_api_key to README https://github.com/rkrug/ROriginStamp/commit/f90fb81b0dc511997b866ec3e20975d9b19e19c7 and https://github.com/rkrug/ROriginStamp/commit/0378f5787bb6b7c13824ac4dd6ed5b4f6988275c
[X] Fix tests to work with different API key https://github.com/rkrug/ROriginStamp/commit/0799c53e1183ff074edccc8a2354150c52fd6a12

I found one test, in which the API key was influencing the results. I tried to run all the tests with a different API key, and they succeeded. If they fail for you, could you please let me know which ones fail?

Thanks.

noamross commented 3 years ago

Thanks for the updates. Under the current version (https://github.com/rkrug/ROriginStamp/commit/eebfbbd53db06a4bde16c44c618d880f485d459b), I'm still getting this one error (previously I had more):

> devtools::test()
Loading ROriginStamp
Testing ROriginStamp
✓ |  OK F W S | Context
✓ |   1       | 000_new_OriginTimestampResponse [0.2 s]                                                      
✓ |   2       | 001_get_currencies [0.9 s]                                                                   
✓ |   2       | 002_get_hash_status [1.3 s]                                                                  
✓ |   2       | 003_get_key_usage [0.9 s]                                                                    
✓ |   3       | 004_get_proof [1.2 s]                                                                        
✓ |   2       | 005_options [0.1 s]                                                                          
x |   1 1     | 006_hash [0.2 s]                                                                             
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Failure (test_006_hash.R:8:5): hash
Snapshot of code has changed:
old[1:8] vs new[1:6]
  "Code"
  "  tf <- tempfile()"
- "Code"
  "  write.csv(data.frame(letters, LETTERS, 1:26), tf)"
- "Code"
  "  x <- list(hash = hash(as.hash("
  "    \"2c5d36be542f8f0e7345d77753a5d7ea61a443ba6a9a86bb060332ad56dba38e\")),"
  "  r_object = hash(LETTERS), r_object_def = hash.default(LETTERS), file = suppressMessages("

old[10:27] vs new[8:21]
  "Message <simpleMessage>"
  "  "
  "  x is already a hash - returning x unprocessed"
- "Message <simpleMessage>"
  "  "
  "  Create sha356 hash from R object x"
- "Message <simpleMessage>"
  "  "
  "  Create sha356 hash from R object x"
- "Message <simpleMessage>"
and 8 more ...

Run `snapshot_accept('006_hash')` if this is a deliberate change
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
✓ |   2       | 100_create_timestamp [0.5 s]                                                                 

══ Results ══════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 5.4 s

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 15 ]

rkrug commented 3 years ago

Sorry about that - I somehow missed the new snapshot for the tests. Commit https://github.com/rkrug/ROriginStamp/commit/17d231005a26120b50d293a79ff76791d6bf3fb6 should have fixed it.

noamross commented 3 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 3 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/433_status.svg)](https://github.com/ropensci/software-review/issues/433)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

rkrug commented 3 years ago

Fixed with https://github.com/rkrug/ROriginStamp/commit/b23fa8732d9fcd51553d79e2c2fa100ce5176c8f

noamross commented 3 years ago

@ropensci-review-bot assign @noamross as editor

ropensci-review-bot commented 3 years ago

Assigned! @noamross is now the editor

noamross commented 3 years ago

@ropensci-review-bot add @cboettig to reviewers

ropensci-review-bot commented 3 years ago

@cboettig added to the reviewers list. Review due date is 2021-04-07. Thanks @cboettig for accepting to review! Please refer to our reviewer guide.

cboettig commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples (that run successfully locally) for all exported functions
[ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

No CONTRIBUTING file or guidelines in README

For packages co-submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software

[ ] Authors: A list of authors with their affiliations

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.

[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 2 h

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

This is a very nicely implemented package, with a clear coding style, light dependencies, and detailed documentation.

Function return display behavior could be cleaned up a bit. Functions return verbose output that is not easily parsed by most users. For example, perhaps create_timestamp could return a clear message that timestamp creation was successfully submitted, and indicate how long a user might want to wait before trying to check if the timestamp has registered. For example, the more detailed return information could be placed into an S3 object with a defined print method that displays only key details (hash, time submitted, etc) concisely.
Minor technical issue, but it's good practice to use put close() commands for R connections into on.exit() in functions, which allows R to close an open connection if the function terminates prematurely.
The use of a trademark name has been a concern in the past. From a skim of the legal documentation links provided, it doesn't look like an issue here, but may still be appropriate to check with the owners of OriginStamp before using the name in the package name?
I'm really hesitant about the practice of trying to timestamp R objects rather than files. Why does the method use version=2 here instead of the default? serialize is not suitable for archival purposes, as noted in the documentation, and having changed in the past and being subject to change in the future. In UNIX philosophy, text files are the universal interface, and it seems that users would be better to serialize files rather than objects? I think when storrr does this, it first trims off the R version bit of the serialization to be more stable? Anyway, just a thought, happy to be convinced otherwise!
I think the vignette would be more compelling if it provided more narrative about the author's envisioned use cases and corresponding workflow. For example, I was just discussing the issue of verifying timestamps with colleagues who are all interested in scientific and statistical methods for forecasting. They noted it would be desirable to verify that a forecast was produced at a particular time prior to when a measurement was actually made. I understand that it often makes sense to defer to OriginStamp's own material about the many possible use cases, limitations, etc, but at least a little more information could be provided here.
Likewise, I think it the vignette would be greatly helped by speaking to the issue of longevity, an area where I found the documentation from OriginStamp to be somewhat unclear. At first glance, it is easy to assume that this ability to confirm a timestamp is only as good as the lifetime of the company behind the API. In fact, it seems the designers have gone out of their way to make sure they are publishing timestamps into major commercial block chains, not merely maintaining there own. As such, it seems like it should be possible to verify a timestamp even if OriginStamp is defunct. But in practice, it seems like this would be very difficult to do, as all package functionality requires an OriginStamp API key. Am I missing something? It seems like in order to make it worthwhile to go through all this trouble to register timestamps in major digital currency blockchains, that it should be possible to verify those timestamps directly from the public block chains without an API key or third party of any kind (even if it is not feasible to submit new timestamps without a key). If so, would it be possible to support that? If not, it kind of feels like "what's the point?" e.g. why not just rely on a trusted authority to gpg sign the hash or something?
Lastly, it might be interesting to have some discussion of alternative approaches (or why things that seem like alternatives aren't). Maybe this is too far beyond the scope of a vignette, but I'm just curious.

cboettig commented 3 years ago

Just an additional followup comment on the last bullet, suggested to me by a colleague @jhpoelen : why not simply compute the hash and post it to some public record? For example, I could merely tweet my hash, or post it in a GitHub gist (both events have publicly verifiable timelines, and are used as 'proofs' by systems like Keybase.io).

noamross commented 3 years ago

Thanks @cboettig for a review so rapid I haven't found a second reviewer yet! @rkrug, you can want to wait to make changes until we get the other review in.

rkrug commented 3 years ago

@cboettig Thanks for the review.

I will follow Noam's suggestion and get back to it in detail after the report from the second reviewer is in.

Just a quick remark: I agree essentially with most of your points, and especially concerning the TTS for R objects, I would only be too happy to remove it - but we will discuss these later.

noamross commented 3 years ago

@ropensci-review-bot add @cgries to reviewers

ropensci-review-bot commented 3 years ago

@cgries added to the reviewers list. Review due date is 2021-05-19. Thanks @cgries for accepting to review! Please refer to our reviewer guide.

cgries commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples (that run successfully locally) for all exported functions
[ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 3h

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

I agree with everything @cboettig already commented on:

I am not able to parse the messages returned by the functions.
The vignette only discusses basic functionality and does not provide any real-world use cases; hence, I am a bit confused on how I would incorporate this into a workflow.
The documentation and examples are minimal, and some are not very useful. E.g. api_url() Value: 'Either the old api url (when url is supplied) or the current api url (when url is not supplied)' did not help me to understand what URL I was supposed to provide.
What is the expected time frame of a valid return from this third-party provider OriginStamp? Are we talking weeks, months, years for a user to get a verification certificate for a certain file?
I agree that a discussion on why using this approach is preferable over any other way to assure authorship and timestamp of creation.

devtools::check() returns a warning: checking for unstated dependencies in examples ... OK WARNING 'qpdf' is needed for checks on size reduction of PDFs

devtools::test() mostly fails, I assume, because of local installation path and not accessing my api key?

rkrug commented 3 years ago

Thanks @cgries for your review. Your and @cboettig comment's will definitely improve the package.

@noamross What is the procedure now - will you provide some additional feedback after which I should address the comments, or should I address them right away?

rkrug commented 3 years ago

Just an additional followup comment on the last bullet, suggested to me by a colleague @jhpoelen : why not simply compute the hash and post it to some public record? For example, I could merely tweet my hash, or post it in a GitHub gist (both events have publicly verifiable timelines, and are used as 'proofs' by systems like Keybase.io).

Yes - this could be done. But there are two major differences in using twitter or GitHub:

They both are dependant on a single company and do not use decentralised storage. If twitter shuts down (depending on your view a dream or a nightmare) the timestamps are gone. OK - one can download the tweets, but I do not think that these downloads are tamperproof. The same for GitHub. A blockchain in comparison is decentralised and tamper proof - one can even download the complete blockchain.
Tweets and gists can be deleted, so the timestamp can be removed. One can see this as an advantage, but I personally see a timestamp as existing after creation and nobody can delete it.

cboettig commented 3 years ago

Thanks @rkrug. I do think this is really neat, and the decentralized strategy is really nice. One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?

(and minor note just for completeness -- if GitHub closed it's doors, it's not necessarily true that it's previous timestamps would be forever lost, e.g. since GitHub is archiving much of it's public content in the Arctic Code Vault (pretty unclear how re-trivial would ever work there) but is also archived by independent operations like Software Heritage. For example: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/rkrug/ROriginStamp

rkrug commented 3 years ago

Thanks @rkrug. I do think this is really neat, and the decentralized strategy is really nice. One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?

This is addressed in a new function, which verifies timestamps from OriginStamp without using the origin stamp API - it is in the review-in-progress branch. This works for newer timestamps (beginning of this year) and older ones can be verified as well, but this is a much more involved process. I do not think this is useful too include, as the package ROriginStamp is not yet on CRAN.

(and minor note just for completeness -- if GitHub closed it's doors, it's not necessarily true that it's previous timestamps would be forever lost, e.g. since GitHub is archiving much of it's public content in the Arctic Code Vault (pretty unclear how re-trivial would ever work there) but is also archived by independent operations like Software Heritage. For example: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/rkrug/ROriginStamp

Didn't know that. Thanks. But at least the Software Heritage is an Opt In if I am not mistaken? but good to know.

rkrug commented 3 years ago

@cboettig Just in addition to the standard blockchain, OriginStamp also offers a Media Blockchain:

What is a media blockchain? Newspapers are distributed in the physical world like blockchains are in the digital. OriginStamp publishes the root hash of the Merkle Tree in different newspapers so that all potential newspaper readers become witnesses of the timestamp. Moreover, the root hash of today is a leaf hash for tomorrow, which means that the two Merkle Trees are linked and subsequent newspaper editions form a chain. The security and reliability of a media blockchain relies solely on the number of archived copies of each edition.

Intriguing.

cboettig commented 3 years ago

This is addressed in a new function, which verifies timestamps from OriginStamp without using the origin stamp API - it is in the review-in-progress branch.

:tada: wow, that's really cool! Yeah, I think even just pointing out that this can be done in principle (even better with example code) is a really nice addition. Not at all surprised that it's pretty involved! Did you point to your example, say, in the vignettes?

Software Heritage is an Opt In ...

They took a comprehensive snapshot of public GitHub a few years ago I think, and also snapshot the major software archives regularly. But yeah, getting a timestamp from OriginStamp or anyone is 'opt in'; what I meant to say is that I think for most academic/researcher types, the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc); it's somewhat incidental that the fact that the easiest way to archive some content in SoftwareHeritage is by putting it on GitHub and then triggering the archive API. (Where here "content" could just mean "the hash of the content whose origin timestamp I wish to prove").

Incidentally your work came up in a recent discussion @cgries and I had with folks interested in timestamps proving that a 'forecast' was really made before a given date. Neat stuff!

thhepp commented 3 years ago

Hi, To clarify, I am researcher and co-founder at OriginStamp.

One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?

Feel free to study our paper.

tl;dr OriginStamp uses an algorithmic approach to embed hashes into public blockchains. Approach can be locally reproduced by downloading a proof that can be algorithmically verified.

What is the expected time frame of a valid return from this third-party provider OriginStamp? Are we talking weeks, months, years for a user to get a verification certificate for a certain file?

Bitcoin: every 24hours Ethereum: every hour OAN (open application network, formerly aion): every 10 minutes

... the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc); it's somewhat incidental that the fact that the easiest way to archive some content in SoftwareHeritage is by putting it on GitHub and then triggering the archive API. (Where here "content" could just mean "the hash of the content whose origin timestamp I wish to prove").

I really appreciate your discussions, but how can you timestamp your scientific work without publishing it, e.g. work in progress? There are cases, when research was rejected by a peer-reviewer, who later published these results (idea theft & missing attribution). Moreover, you can document experiments using OriginStamp and when publishing the paper, you should publish this timestamped research log as well. Hence, it significantly increases reproducibility of academic work, because you simply prove its integrity.

... They took a comprehensive snapshot of public GitHub a few years ago I think, and also snapshot the major software archives regularly. But yeah, getting a timestamp from OriginStamp or anyone is 'opt in'; what I meant to say is that I think for most academic/researcher types, the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc);

Besides improved reproducibility, central services can be forced by authorities to censor critical research. I want to share the following link with you: wiki news. From my point of view, the bullet-proof solution for a system for software heritage must be based on a decentralized protocol maintained by the scientific community & institutions to guarantee freedom of research. (maybe my next research project)

But in practice, it seems like this would be very difficult to do, as all package functionality requires an OriginStamp API key. Am I missing something? It seems like in order to make it worthwhile to go through all this trouble to register timestamps in major digital currency blockchains, that it should be possible to verify those timestamps directly from the public block chains without an API key or third party of any kind (even if it is not feasible to submit new timestamps without a key). If so, would it be possible to support that? If not, it kind of feels like "what's the point?" e.g. why not just rely on a trusted authority to gpg sign the hash or something?

Sadly, the operation of such a simple timestamp service requires funding and depends heavily on fluctuating crypto costs. Therefore we had to develop a business that covers these costs. Good news OriginStamp was founded by researchers, and is free to use for researchers like GitHub!

If you have any question, feel free to ask me anything.

cboettig commented 3 years ago

@thhepp great to meet you and thanks for joining the discussion here!

From my point of view, the bullet-proof solution for a system for software heritage must be based on a decentralized protocol maintained by the scientific community & institutions to guarantee freedom of research. (maybe my next research project)

Sounds very cool, I'm right there with you. I'm currently quite interested in the use of content-hashes like in OriginStamp as a means to provide location-agnostic identifiers, inspired by the work of Ben Trask at https://hash-archive.org and others, as a decentralized way to resolve content (does not assume content has only one 'canonical' location and does not rely on an identifier controlled by a single central authority, like the DOI. Obviously the identifier is only a part of the puzzle though.

Sadly, the operation of such a simple timestamp service requires funding and depends heavily on fluctuating crypto costs

Yeah, this is the primary motivation for my reservations. It's great that you've developed a business model to cover these costs and made that free to researchers, but it is a weak point in the chain. If the business model fails, costs to verification independent of OriginStamp become prohibitive. What bugs me though is that this need not be the case, the cost is not inherent to the approach, only to the fact that it piggybacks on a crypto currency blockchain.

how can you timestamp your scientific work without publishing it, e.g. work in progress?

Data archives like Zenodo would allow you to archive ("publish") a list of SHA hashes, much as you do with OriginStamp. You could even "blockchain" such lists by having the subsequent list include the hash identifier of the previous list. While Zenodo (backed by CERN) may not last forever, scientific data archives are probably the most persistent research storage we have, are free and relatively frictionless to access. Just a thought.

Don't mean this to sound like I'm not supportive of the ROriginStamp and OriginStamp projects, I think these are great options to have available to researchers!

rkrug commented 2 years ago

Unfortunately, the verification outside OriginStamp does not work anymore, as the website which I used to get the transaction from does not exist any more (api.smartbit.com.au). As this topic moved out of the focus of our work (we decided that time stamping is not needed anymore) I haven't spend much time on this. My Apologies.

I see from my side two options: The first one is to trim the package down to the parts which are still working (which is essentially as submitted with removed hashing of R objects) and to leave it as a proof of concept and beta package, or to withdraw it from submission.

I will leave the decision up to the editors at rOpenSci, but would prefer the first solution as the time invested by the reviewers would not have been for nothing..

Thanks.

noamross commented 2 years ago

@rkrug. We're going back and picking up some reviews that got lost in ambiguous situations like this. My apologies that this issue has foundered so long! If you are still interested, I think the first solution is fine. I will ask reviewers are amenable to reviewing your changes, and if not, we'll do an editors' review to check that you have addressed them.

rkrug commented 2 years ago

@noamross Thanks and no problem about the delay - it suited me well.

OK - I will look at implementing suggested changes to follow solution one (trim the package to what is still working and to have it effectively as a working but "could be improved" proof of concept) today.

rkrug commented 2 years ago

I just merged my changes in the master branch and made a release v_0.9.0.

As suggested by @noamross, I have prepared the package as a "Proof of Concept". Therefore, I did not address most issues concerning the documentation.

I added stop() statements in the functions which do not work anymore due to a retired website and stated this in the man pages as well. If other sites can be found (and I am sure they are) which provide the same service for the lookup, the functions can be easily re-activated.

I also removed the possibility to hash R objects (including characters) as these are, as pointed out by the reviewers, dependant on the serialisation used by R, which can change over time.

I hope that these changes are sufficient, considering the "Proof of Concept" nature of the package.

Thanks,

Rainer

noamross commented 2 years ago

I think I parsed your response incorrectly @rkrug - I thought you were describing the two options as "trimming down the package" and "leaving it as a proof of concept." While I think a trimmed-down version with fewer functionalities is fine, it should still be fully documented per the reviewers' comments and functional within those trimmed-down bounds, with non-working functions removed or with full deprecation code (the latter not really needed for this pre-CRAN package). This is a bit beyond "proof of concept," because the rOpenSci approval repesents our endorsement as the best-in-class way to perform the task. I think with current limitations the package works well as "best way to use OriginStamp API from R", even if the documentation also covers how it fits into the rest of the TTS ecosystem that it may not directly interface with.

rkrug commented 2 years ago

Thanks a lot @noamross. As stated above, I can't afford to spend much more time on this package, which I will likely not use I. the future and consequently that I will not put much effort into maintaining it

I perfectly understand your reasoning, so please feel free to mark the package as withdrawn unless there is interest from your side in tidying it up and not too much work involved on my side.

Thanks and sorry for the work you, and @cboettig and @cgries have put into this.

@thhepp Even if the package is withdrawn here, I will keep it on GitHub and likely on Runiverse.

jhpoelen commented 2 years ago

I appreciate the notes of @rkrug related to effort required to keep packages up and running, and that another time might be better to consider this package for review. In my mind, this exchange shows that the review process works nicely. And . . . that reviewing, just like maintaining packages, takes time.

Thanks @cboettig and @cgries for reviewing.

emilyriederer commented 2 years ago

Thanks so all for this discussion. As I'm doing a round of clean-up on outstanding issues, I'm going to go ahead and close this issue.

rkrug commented 2 years ago

Thanks @emilyriederer

ropensci / software-review