tweag / funflow2

Compose and run computational workflows
MIT License
9 stars 0 forks source link

Allow combining Docker task inputs #88

Closed vreuter closed 3 years ago

vreuter commented 3 years ago

Not super necessary, but the empty and combine ops seemed natural when I went to make a Docker task input in a notebook. It doesn't feel common, but I could see a user having JSON's of different inputs stored locally and/or fetched from remote stores and then wanting to combine them and this would help w/ that.

vreuter commented 3 years ago

Hey @dorranh I'm testing out the instances, but need to figure out why generating the arbitrary digest seems to be hanging the tests (I added a TODO about that in most recent commit), so this isn't ready for merge yet.

vreuter commented 3 years ago

Also, I exposed a couple constructors, mainly for testing though it also didn't seem harmful since the fields of the relevant types already seem to have the domain restriction / safety at the type level (like CS.Item), and I couldn't get an alternative approach working (see #91). Also added a couple derived Show instances related to usage QuickCheck (though also may have benefit in their own right, #86), is that OK?

vreuter commented 3 years ago

OK, the hanging issue for the associativity test for semigroup seems OK now, taking a different route to an arbitrary Digest SHA256. Not ready for merge, but that part is all set I think.

dorranh commented 3 years ago

PS: In case you haven't seen this already: we also publish the docs for CAS on the website: https://tweag.github.io/funflow2/api/cas-store/html/Data-CAS-ContentStore.html#g:1

vreuter commented 3 years ago

Cool I will re-hide the constructors and update tests accordingly.

dorranh commented 3 years ago

Awesome, thanks! Let me know if you run into any issues

vreuter commented 3 years ago

As far as I can tell, what we are doing here is what I would consider a more thorough approach where on each test iteration we generate DockerTaskInput to specifically trigger the the different cases for things like de-duplicating the volume mounts. In principle this should work for what we want to test.

Yep, that's what I was aiming for.

Another approach to this would be to construct a single generator which constructs both kinds of cases (i.e. inputs with and without duplicate mount points) and then just assert 1. that the monoid properties as we do now (in the "pure" assertions) and 2. the property that no duplicate mount points exist in the output. The advantages of this approach would be that I think it should be a bit more efficient since you are only calling the random generator once up-front instead of having to generate the DockerTaskInputs in each of the property assertions and that it might require less code.

I agree on at least a couple fronts:

  1. It would more efficient (specifically, fewer inputs generated) if we grouped the assertions, like left/right identities and associativity, all under one property, sth like prop_DockerTaskInputMonoidLawsHold.
  2. It would probably wind up being a bit fewer lines.

I think though that as-is has a couple advantages also:

  1. Should a law break, with the properties separated it's easier to quickly see which it is.
  2. Separating the case generation out by property ensures each atomic property gets plenty of work, without having a bit trickier code to write a single generator with controlled imbalance to make sure the rarer cases (like certain overlap b/w maps) come up sufficiently often
vreuter commented 3 years ago

OK I think this is ready for merge now @dorranh , pending what you think about the additional test and about wording in the docs for the new instances.