Open zachdaniel opened 2 years ago
There's a big difference between uniq_list_of/2
and what you are looking for.
uniq_list_of/2
is a generator that generates list values where each list is "unique", that is, each generated list has zero duplicated elements. Generated lists are not unique "across lists", that is, the generator could generated [1, 2, 3]
and then [1, 2, 3]
, which are two unique lists.
What you are looking for is a way for a generator to generate unique values across the generated values. It's a completely different implementation, as we need to keep a set of "seen" values when generating.
Since the generators inherently generate random data, this is not too easy to achieve. We can't deterministically generate a next unique value that hasn't been generated before: we need to generate the next value, check if it's in the "seen set", and re-generate it if it's there, and so on.
I'd love for you to take a stab at this if you want and Iām happy to provide assistance.
When it comes to "shrinkability", you're probably right, this would end up being an unshrinkable generator. Making sure that the whole lazy tree (the data structure we use for shrinking) is unique might be a challenge. š
Yeah, definitely some interesting issues here, especially considering that uniq_list_of/2
is really just filtering a generator. So for some context here, what I'm doing is adding StreamData
generators that produce valid input to an Ash Framework resource action. It isn't a hard requirement that each value returned by that stream can be used within the same transaction, but it would be very nice if it could be. Ash is "aware" of the various unique constraints on your resource, like email
must be unique, and foo
and bar
must be unique together.
So here is what I'm wondering: could we potentially build in some concept of a "unique generator"? primarily because generating unique values for different types could involve different things, and they still might be shrinkable in some way. An example that comes to mind is that values could be "skipped" along the way, i.e shorter strings, lower integers, and potentially be reserved for shrinking (unless the stream runs out or something, in which case they'd need to be used for generating more values).
Ash could use this under the hood by just ensuring that (in the above example) that email
comes from unique_string()
and foo
and bar
come from unique_map(...)
.
Here is the kind of thing I have in mind:
def unique_integer(left..right = _range) do
# An implementation that does something like using `System.monotonic_integer`, to only produce increasing values
new(...., unique: true)
end
We could store on the stream that it is capable of producing unique values each time. And then we could have unique streams only compose with other unique streams, i.e
unique_map(%{id: unique_integer()})
As I'm typing this I'm honestly not sure if its just a crazy-bad idea or what š . unique_map
may also be a bit strange... I guess all it would really do is require that the generators are unique generators, so possibly no point in having it?
I'm not suggesting this is the solution I need/want, but in my (somewhat limited) experience with property testing, avoiding filters will drastically increase your quality of life, and using a filter to get unique results would, I imagine, be a pretty big performance hit and would make the rate of "too many filtered values" significantly higher.
At the moment, I see
uniq_list_of/2
that allows for taking a generator and producing a unique list of items from that generator. But is there anything likemaps |> uniq_by(&Map.take(&1, [:a, :b])) |> ...
that ensures that no duplicate values are ever produced? It would need to also not reuse any values for shrinking, so perhaps would be unshrinkable (I'm out of my depth there).What I'm looking for:
The best I can find at the moment
The latter does not compose the way that I'm looking to compose the generator, as it produces lists.