Closed rkrug closed 2 years ago
Hello! @noamross will be your Handling Editor.
Greetings @rkrug!
I am unable to run tests successfully locally. Even when I have set NOT_CRAN=true
and add ROriginStamp_api_key
to my .Renviron
. I think this is likely due to the fact that some of the tests are tied to your own OriginStamp account. In general we want reviewers/users/contributors to be able to run the test suite. There are a few options for this, the best of which is generally refactoring the test suite so that any objects that need to be tested against are created as part of the tests using the users' account. Sometimes testing requires some more elaborate set up like setting up a dummy developers' account. In such cases where manual steps are required we generally request they be described in a CONTRIBUTING.md file.
I also note that the package vignette isn't visisble at https://rkrug.github.io/ROriginStamp/articles/introduction.html, and I would suggest being explicit in the README that the API key environment variable should be ROriginStamp_api_key
. I had to find this in the code and we will want this to be simple for the reviewers to get going as easily as possible.
Please make these changes and I can proceed with finding reviewers. Let me know if you have any questions.
Other than that, your package passes goodpractice::gp()
tests with flying colors, except for a few long code lines, which reviewers can consider when evaluating code readability but requires no immediate action.
✖ avoid long code lines, it is bad for readability. Also,
many people prefer editor windows that are about 80 characters
wide. Try make your lines shorter than 80 characters
R/create_timestamp.R:7:1
R/create_timestamp.R:8:1
R/create_timestamp.R:10:1
R/create_timestamp.R:13:1
R/create_timestamp.R:14:1
... and 58 more lines
Thanks for the comments, @noamross.
ROpenSci_api_key
to README https://github.com/rkrug/ROriginStamp/commit/f90fb81b0dc511997b866ec3e20975d9b19e19c7 and https://github.com/rkrug/ROriginStamp/commit/0378f5787bb6b7c13824ac4dd6ed5b4f6988275cI found one test, in which the API key was influencing the results. I tried to run all the tests with a different API key, and they succeeded. If they fail for you, could you please let me know which ones fail?
Thanks.
Thanks for the updates. Under the current version (https://github.com/rkrug/ROriginStamp/commit/eebfbbd53db06a4bde16c44c618d880f485d459b), I'm still getting this one error (previously I had more):
> devtools::test()
Loading ROriginStamp
Testing ROriginStamp
✓ | OK F W S | Context
✓ | 1 | 000_new_OriginTimestampResponse [0.2 s]
✓ | 2 | 001_get_currencies [0.9 s]
✓ | 2 | 002_get_hash_status [1.3 s]
✓ | 2 | 003_get_key_usage [0.9 s]
✓ | 3 | 004_get_proof [1.2 s]
✓ | 2 | 005_options [0.1 s]
x | 1 1 | 006_hash [0.2 s]
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Failure (test_006_hash.R:8:5): hash
Snapshot of code has changed:
old[1:8] vs new[1:6]
"Code"
" tf <- tempfile()"
- "Code"
" write.csv(data.frame(letters, LETTERS, 1:26), tf)"
- "Code"
" x <- list(hash = hash(as.hash("
" \"2c5d36be542f8f0e7345d77753a5d7ea61a443ba6a9a86bb060332ad56dba38e\")),"
" r_object = hash(LETTERS), r_object_def = hash.default(LETTERS), file = suppressMessages("
old[10:27] vs new[8:21]
"Message <simpleMessage>"
" "
" x is already a hash - returning x unprocessed"
- "Message <simpleMessage>"
" "
" Create sha356 hash from R object x"
- "Message <simpleMessage>"
" "
" Create sha356 hash from R object x"
- "Message <simpleMessage>"
and 8 more ...
Run `snapshot_accept('006_hash')` if this is a deliberate change
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
✓ | 2 | 100_create_timestamp [0.5 s]
══ Results ══════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 5.4 s
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 15 ]
Sorry about that - I somehow missed the new snapshot for the tests. Commit https://github.com/rkrug/ROriginStamp/commit/17d231005a26120b50d293a79ff76791d6bf3fb6 should have fixed it.
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/433_status.svg)](https://github.com/ropensci/software-review/issues/433)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
@ropensci-review-bot assign @noamross as editor
Assigned! @noamross is now the editor
@ropensci-review-bot add @cboettig to reviewers
@cboettig added to the reviewers list. Review due date is 2021-04-07. Thanks @cboettig for accepting to review! Please refer to our reviewer guide.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).No CONTRIBUTING file or guidelines in README
For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 2 h
This is a very nicely implemented package, with a clear coding style, light dependencies, and detailed documentation.
Function return display behavior could be cleaned up a bit. Functions return verbose output that is not easily parsed by most users. For example, perhaps create_timestamp
could return a clear message that timestamp creation was successfully submitted, and indicate how long a user might want to wait before trying to check if the timestamp has registered. For example, the more detailed return information could be placed into an S3 object with a defined print method that displays only key details (hash, time submitted, etc) concisely.
Minor technical issue, but it's good practice to use put close()
commands for R connections into on.exit()
in functions, which allows R to close an open connection if the function terminates prematurely.
The use of a trademark name has been a concern in the past. From a skim of the legal documentation links provided, it doesn't look like an issue here, but may still be appropriate to check with the owners of OriginStamp before using the name in the package name?
I'm really hesitant about the practice of trying to timestamp R objects rather than files. Why does the method use version=2
here instead of the default? serialize
is not suitable for archival purposes, as noted in the documentation, and having changed in the past and being subject to change in the future. In UNIX philosophy, text files are the universal interface, and it seems that users would be better to serialize files rather than objects? I think when storrr
does this, it first trims off the R version bit of the serialization to be more stable? Anyway, just a thought, happy to be convinced otherwise!
I think the vignette would be more compelling if it provided more narrative about the author's envisioned use cases and corresponding workflow. For example, I was just discussing the issue of verifying timestamps with colleagues who are all interested in scientific and statistical methods for forecasting. They noted it would be desirable to verify that a forecast was produced at a particular time prior to when a measurement was actually made. I understand that it often makes sense to defer to OriginStamp's own material about the many possible use cases, limitations, etc, but at least a little more information could be provided here.
Likewise, I think it the vignette would be greatly helped by speaking to the issue of longevity, an area where I found the documentation from OriginStamp to be somewhat unclear. At first glance, it is easy to assume that this ability to confirm a timestamp is only as good as the lifetime of the company behind the API. In fact, it seems the designers have gone out of their way to make sure they are publishing timestamps into major commercial block chains, not merely maintaining there own. As such, it seems like it should be possible to verify a timestamp even if OriginStamp is defunct. But in practice, it seems like this would be very difficult to do, as all package functionality requires an OriginStamp API key. Am I missing something? It seems like in order to make it worthwhile to go through all this trouble to register timestamps in major digital currency blockchains, that it should be possible to verify those timestamps directly from the public block chains without an API key or third party of any kind (even if it is not feasible to submit new timestamps without a key). If so, would it be possible to support that? If not, it kind of feels like "what's the point?" e.g. why not just rely on a trusted authority to gpg sign the hash or something?
Lastly, it might be interesting to have some discussion of alternative approaches (or why things that seem like alternatives aren't). Maybe this is too far beyond the scope of a vignette, but I'm just curious.
Just an additional followup comment on the last bullet, suggested to me by a colleague @jhpoelen : why not simply compute the hash and post it to some public record? For example, I could merely tweet my hash, or post it in a GitHub gist (both events have publicly verifiable timelines, and are used as 'proofs' by systems like Keybase.io).
Thanks @cboettig for a review so rapid I haven't found a second reviewer yet! @rkrug, you can want to wait to make changes until we get the other review in.
@cboettig Thanks for the review.
I will follow Noam's suggestion and get back to it in detail after the report from the second reviewer is in.
Just a quick remark: I agree essentially with most of your points, and especially concerning the TTS for R objects, I would only be too happy to remove it - but we will discuss these later.
@ropensci-review-bot add @cgries to reviewers
@cgries added to the reviewers list. Review due date is 2021-05-19. Thanks @cgries for accepting to review! Please refer to our reviewer guide.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 3h
I agree with everything @cboettig already commented on:
api_url()
Value: 'Either the old api url (when url is supplied) or the current api url (when url is not supplied)' did not help me to understand what URL I was supposed to provide.devtools::check()
returns a warning:
checking for unstated dependencies in examples ... OK
WARNING
'qpdf' is needed for checks on size reduction of PDFs
devtools::test()
mostly fails, I assume, because of local installation path and not accessing my api key?
Thanks @cgries for your review. Your and @cboettig comment's will definitely improve the package.
@noamross What is the procedure now - will you provide some additional feedback after which I should address the comments, or should I address them right away?
Just an additional followup comment on the last bullet, suggested to me by a colleague @jhpoelen : why not simply compute the hash and post it to some public record? For example, I could merely tweet my hash, or post it in a GitHub gist (both events have publicly verifiable timelines, and are used as 'proofs' by systems like Keybase.io).
Yes - this could be done. But there are two major differences in using twitter or GitHub:
Thanks @rkrug. I do think this is really neat, and the decentralized strategy is really nice. One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?
(and minor note just for completeness -- if GitHub closed it's doors, it's not necessarily true that it's previous timestamps would be forever lost, e.g. since GitHub is archiving much of it's public content in the Arctic Code Vault (pretty unclear how re-trivial would ever work there) but is also archived by independent operations like Software Heritage. For example: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/rkrug/ROriginStamp
Thanks @rkrug. I do think this is really neat, and the decentralized strategy is really nice. One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?
This is addressed in a new function, which verifies timestamps from OriginStamp without using the origin stamp API - it is in the review-in-progress branch. This works for newer timestamps (beginning of this year) and older ones can be verified as well, but this is a much more involved process. I do not think this is useful too include, as the package ROriginStamp is not yet on CRAN.
(and minor note just for completeness -- if GitHub closed it's doors, it's not necessarily true that it's previous timestamps would be forever lost, e.g. since GitHub is archiving much of it's public content in the Arctic Code Vault (pretty unclear how re-trivial would ever work there) but is also archived by independent operations like Software Heritage. For example: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/rkrug/ROriginStamp
Didn't know that. Thanks. But at least the Software Heritage is an Opt In if I am not mistaken? but good to know.
@cboettig Just in addition to the standard blockchain, OriginStamp also offers a Media Blockchain:
What is a media blockchain? Newspapers are distributed in the physical world like blockchains are in the digital. OriginStamp publishes the root hash of the Merkle Tree in different newspapers so that all potential newspaper readers become witnesses of the timestamp. Moreover, the root hash of today is a leaf hash for tomorrow, which means that the two Merkle Trees are linked and subsequent newspaper editions form a chain. The security and reliability of a media blockchain relies solely on the number of archived copies of each edition.
Intriguing.
This is addressed in a new function, which verifies timestamps from OriginStamp without using the origin stamp API - it is in the review-in-progress branch.
:tada: wow, that's really cool! Yeah, I think even just pointing out that this can be done in principle (even better with example code) is a really nice addition. Not at all surprised that it's pretty involved! Did you point to your example, say, in the vignettes?
Software Heritage is an Opt In ...
They took a comprehensive snapshot of public GitHub a few years ago I think, and also snapshot the major software archives regularly. But yeah, getting a timestamp from OriginStamp or anyone is 'opt in'; what I meant to say is that I think for most academic/researcher types, the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc); it's somewhat incidental that the fact that the easiest way to archive some content in SoftwareHeritage is by putting it on GitHub and then triggering the archive API. (Where here "content" could just mean "the hash of the content whose origin timestamp I wish to prove").
Incidentally your work came up in a recent discussion @cgries and I had with folks interested in timestamps proving that a 'forecast' was really made before a given date. Neat stuff!
Hi, To clarify, I am researcher and co-founder at OriginStamp.
One thing I'm still unclear on is how you'd ever verify a timestamp in practice from a public digital currency blockchain if OriginStamp went belly up?
Feel free to study our paper.
tl;dr OriginStamp uses an algorithmic approach to embed hashes into public blockchains. Approach can be locally reproduced by downloading a proof that can be algorithmically verified.
What is the expected time frame of a valid return from this third-party provider OriginStamp? Are we talking weeks, months, years for a user to get a verification certificate for a certain file?
Bitcoin: every 24hours Ethereum: every hour OAN (open application network, formerly aion): every 10 minutes
... the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc); it's somewhat incidental that the fact that the easiest way to archive some content in SoftwareHeritage is by putting it on GitHub and then triggering the archive API. (Where here "content" could just mean "the hash of the content whose origin timestamp I wish to prove").
I really appreciate your discussions, but how can you timestamp your scientific work without publishing it, e.g. work in progress? There are cases, when research was rejected by a peer-reviewer, who later published these results (idea theft & missing attribution). Moreover, you can document experiments using OriginStamp and when publishing the paper, you should publish this timestamped research log as well. Hence, it significantly increases reproducibility of academic work, because you simply prove its integrity.
... They took a comprehensive snapshot of public GitHub a few years ago I think, and also snapshot the major software archives regularly. But yeah, getting a timestamp from OriginStamp or anyone is 'opt in'; what I meant to say is that I think for most academic/researcher types, the standard route to getting a timestamp is archiving in a scientific repository (Zenodo, EDI, DataONE, SoftwareHeritage, etc);
Besides improved reproducibility, central services can be forced by authorities to censor critical research. I want to share the following link with you: wiki news. From my point of view, the bullet-proof solution for a system for software heritage must be based on a decentralized protocol maintained by the scientific community & institutions to guarantee freedom of research. (maybe my next research project)
But in practice, it seems like this would be very difficult to do, as all package functionality requires an OriginStamp API key. Am I missing something? It seems like in order to make it worthwhile to go through all this trouble to register timestamps in major digital currency blockchains, that it should be possible to verify those timestamps directly from the public block chains without an API key or third party of any kind (even if it is not feasible to submit new timestamps without a key). If so, would it be possible to support that? If not, it kind of feels like "what's the point?" e.g. why not just rely on a trusted authority to gpg sign the hash or something?
Sadly, the operation of such a simple timestamp service requires funding and depends heavily on fluctuating crypto costs. Therefore we had to develop a business that covers these costs. Good news OriginStamp was founded by researchers, and is free to use for researchers like GitHub!
If you have any question, feel free to ask me anything.
@thhepp great to meet you and thanks for joining the discussion here!
From my point of view, the bullet-proof solution for a system for software heritage must be based on a decentralized protocol maintained by the scientific community & institutions to guarantee freedom of research. (maybe my next research project)
Sounds very cool, I'm right there with you. I'm currently quite interested in the use of content-hashes like in OriginStamp as a means to provide location-agnostic identifiers, inspired by the work of Ben Trask at https://hash-archive.org and others, as a decentralized way to resolve content (does not assume content has only one 'canonical' location and does not rely on an identifier controlled by a single central authority, like the DOI. Obviously the identifier is only a part of the puzzle though.
Sadly, the operation of such a simple timestamp service requires funding and depends heavily on fluctuating crypto costs
Yeah, this is the primary motivation for my reservations. It's great that you've developed a business model to cover these costs and made that free to researchers, but it is a weak point in the chain. If the business model fails, costs to verification independent of OriginStamp become prohibitive. What bugs me though is that this need not be the case, the cost is not inherent to the approach, only to the fact that it piggybacks on a crypto currency blockchain.
how can you timestamp your scientific work without publishing it, e.g. work in progress?
Data archives like Zenodo would allow you to archive ("publish") a list of SHA hashes, much as you do with OriginStamp. You could even "blockchain" such lists by having the subsequent list include the hash identifier of the previous list. While Zenodo (backed by CERN) may not last forever, scientific data archives are probably the most persistent research storage we have, are free and relatively frictionless to access. Just a thought.
Don't mean this to sound like I'm not supportive of the ROriginStamp and OriginStamp projects, I think these are great options to have available to researchers!
Unfortunately, the verification outside OriginStamp does not work anymore, as the website which I used to get the transaction from does not exist any more (api.smartbit.com.au). As this topic moved out of the focus of our work (we decided that time stamping is not needed anymore) I haven't spend much time on this. My Apologies.
I see from my side two options: The first one is to trim the package down to the parts which are still working (which is essentially as submitted with removed hashing of R objects) and to leave it as a proof of concept and beta package, or to withdraw it from submission.
I will leave the decision up to the editors at rOpenSci, but would prefer the first solution as the time invested by the reviewers would not have been for nothing..
Thanks.
@rkrug. We're going back and picking up some reviews that got lost in ambiguous situations like this. My apologies that this issue has foundered so long! If you are still interested, I think the first solution is fine. I will ask reviewers are amenable to reviewing your changes, and if not, we'll do an editors' review to check that you have addressed them.
@noamross Thanks and no problem about the delay - it suited me well.
OK - I will look at implementing suggested changes to follow solution one (trim the package to what is still working and to have it effectively as a working but "could be improved" proof of concept) today.
I just merged my changes in the master branch and made a release v_0.9.0.
As suggested by @noamross, I have prepared the package as a "Proof of Concept". Therefore, I did not address most issues concerning the documentation.
I added stop() statements in the functions which do not work anymore due to a retired website and stated this in the man pages as well. If other sites can be found (and I am sure they are) which provide the same service for the lookup, the functions can be easily re-activated.
I also removed the possibility to hash R objects (including characters) as these are, as pointed out by the reviewers, dependant on the serialisation used by R, which can change over time.
I hope that these changes are sufficient, considering the "Proof of Concept" nature of the package.
Thanks,
Rainer
I think I parsed your response incorrectly @rkrug - I thought you were describing the two options as "trimming down the package" and "leaving it as a proof of concept." While I think a trimmed-down version with fewer functionalities is fine, it should still be fully documented per the reviewers' comments and functional within those trimmed-down bounds, with non-working functions removed or with full deprecation code (the latter not really needed for this pre-CRAN package). This is a bit beyond "proof of concept," because the rOpenSci approval repesents our endorsement as the best-in-class way to perform the task. I think with current limitations the package works well as "best way to use OriginStamp API from R", even if the documentation also covers how it fits into the rest of the TTS ecosystem that it may not directly interface with.
Thanks a lot @noamross. As stated above, I can't afford to spend much more time on this package, which I will likely not use I. the future and consequently that I will not put much effort into maintaining it
I perfectly understand your reasoning, so please feel free to mark the package as withdrawn unless there is interest from your side in tidying it up and not too much work involved on my side.
Thanks and sorry for the work you, and @cboettig and @cgries have put into this.
@thhepp Even if the package is withdrawn here, I will keep it on GitHub and likely on Runiverse.
I appreciate the notes of @rkrug related to effort required to keep packages up and running, and that another time might be better to consider this package for review. In my mind, this exchange shows that the review process works nicely. And . . . that reviewing, just like maintaining packages, takes time.
Thanks @cboettig and @cgries for reviewing.
Thanks so all for this discussion. As I'm doing a round of clean-up on outstanding issues, I'm going to go ahead and close this issue.
Thanks @emilyriederer
Submitting Author Name: Rainer M. Krug Submitting Author Github Handle: !--author1-->@rkrug<!--end-author1-- Repository: https://github.com/rkrug/ROriginStamp Version submitted: 0.8.0 Editor: !--editor-->@noamross<!--end-editor-- Reviewers: @cboettig, @cgries
Due date for @cboettig: 2021-04-07 Due date for @cgries: 2021-05-19Archive: TBD Version accepted: TBD
Scope
As recommended in the Presubmission inquiry https://github.com/ropensci/software-review/issues/397#issuecomment-703420600
The package automates the process of obtaining trusted timestamps for R objects as well as files.
The package is aimed at a general audience. It can be used to obtain trusted timestamps too prove that any object (e.g. dataset, graphs, documents of any type, etc) were created at or before the time of the time stamping and not modified afterwards. The hashing of the object and querying this hash at the timestamp provider returns the proof / certificate.
Not to my knowledge.
NA
https://github.com/ropensci/software-review/issues/397#issuecomment-703420600 @maelle
Technical checks
Confirm each of the following by checking the box.
This package:
orCodeCov.Publication options
[X] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct