polydawn / repeatr

Repeatr: Reproducible, hermetic Computation. Provision containers from Content-Addressable snapshots; run using familiar containers (e.g. runc); store outputs in Content-Addressable form too! JSON API; connect your own pipelines! (Or, use github.com/polydawn/stellar for pipelines!)
https://repeatr.io
Apache License 2.0
68 stars 5 forks source link

Formula hashes should include "conjectured" output hashes #90

Closed tazjin closed 7 years ago

tazjin commented 7 years ago

Speaking about this:

for _, spec := range f2.Outputs {
    spec.Hash = ""
    spec.Warehouses = nil
}

(also: argh mutability >.<)

This should only zero output hashes that have conjecture set to false.

TripleDogDare commented 7 years ago

We're currently planning on having multiple hashes for formulas. The current formula hash is now described as the "setup" hash which does not include any output data. Pinned outputs would be included in a "result" hash. You need to be able to link two formulas that are otherwise equivalent but have different outputs from cat /dev/urandom.

timthelion commented 7 years ago

I'm not sure I understand the details of your proposal. How would the conjectured hash work? Would it be a type of hash that didn't include the results of cat /dev/urandom?

On 05/11/2017 06:37 PM, CJB wrote:

We're currently planning on having multiple hashes for formulas. The current formula hash is now described as the "setup" hash which does not include any output data. Pinned outputs would be included in a "result" hash. You need to be able to link two formulas that are otherwise equivalent but have different outputs from |cat /dev/urandom|.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/polydawn/repeatr/issues/90#issuecomment-300846444, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU7-DpFDGzbiFfNf_ubIk-Licd8SjNYks5r4zllgaJpZM4MhZhR.

tazjin commented 7 years ago

@timthelion This doesn't really apply anymore because @heavenlyhash removed the conjecture field.

My reasoning was that if you know that a certain output will be reproducible (this is information the user is adding into the system!) it is safe to include the hashes of those outputs in the formula hash.

TripleDogDare commented 7 years ago

Conjectures were scrubbed in #98 actually. Can one know if an output is reproducible? I can certainly assert that it is, but those assertions exist on a different plane than the inputs+computation.

warpfork commented 7 years ago

... belated close ...

This whole topic is more clearly resolved in the new "r200" version of the formula. The formula structure now clearly contains only simple, deterministic values, and all things which have any user-opinionated values at all are separated (often into a 'formulaContext' object, for example, which is a sibling to the formula in practice, but keeping things like e.g. urls in that separate object makes it much clearly what is covered by the "setup hash" vs not).

+1 to @TripleDogDare 's comments about the underling philosophy -- it doesn't really matter if the author of a formula believes a system is reproducible or not. Believing doesn't make it so. This isn't to say we want to rule out ever accepting that kind of metadata anywhere in the ecosystem, but it does say it doesn't belong in the formula structure.