volatiletech / sqlboiler

Generate a Go ORM tailored to your database schema.
BSD 3-Clause "New" or "Revised" License
6.73k stars 544 forks source link

Randomization interface #284

Closed aarondl closed 6 years ago

aarondl commented 6 years ago

Randomize should have an interface so we can create types that randomize themselves, otherwise we'll never be able to keep up in randomize.go

aarondl commented 6 years ago

Done. See changelog and randomize package.

saulortega commented 6 years ago

Hi, @aarondl

It would be good if the Randomizer interface is independent of sqlboiler, that a third-party library does not need to import "github.com/volatiletech/sqlboiler/randomize".

I think that Randomize(s * Seed, fieldType string, shouldBeNull bool) is unnecessary. Simply Randomize().

The seed can be generated by the library. The nullity can also be decided by the library. And I do not understand very well why you use fieldType, but surely it can also be ommited.

aarondl commented 6 years ago

I agree this is pretty undesirable. I already ran into this myself and wasn't too impressed with the interface I created here.

The problem is to create a sequential "random" value, we need more than just a single integer in a lot of cases. Though maybe we could instead switch it to an int64, and then we could segment it where we need more random. It's exceedingly rare we need multiple values that are bigger than 256, so we could get up to 8 random values out of it. But it's probably not enough for things like Str. So then we're back at requiring seeds.

Your Randomize() proposal does not work at all however. Let me elaborate why:

seed: The reason the seed was introduced is because of unique columns. It was often problematic that during randomization (pseudo random) you would get columns conflicting due to uniqueness over the course of a test. So we now use the seed, an ever-increasing integer that reduces the collision possibility a lot for unique columns. I'm opened to other suggestions on how to tackle this problem however.

fieldType: Fieldtype must be present. Because the reflect.TypeOf something may be a string, but it could have very specific randomization needs. The best example of this is a uuid. Yes it's a string, but if the value is abc the database will reject it and the tests will fail. Another example is mediumint: we know that int is a medium int and can only hold values up to 8388607 not the normal 2^(16-1) that the Go int16 would suggest. FieldType therefore cannot be eliminated from the arguments.

shouldBeNull: Some values should not be null ever and it's up to the test to determine when that is, even if it's nullable sometimes we NEED there to be a value there. If the implementer of the interface gets to decide this, we have no way to say: "Give me a random value, but ensure it can't be null" which is something we need to be able to control which columns have values and which don't.

So in short the only one we have the ability to change is seed.

I only have bad alternatives for seed currently:

  1. Create a mini-package for Seed so it doesn't come with huge sqlboiler dependencies. Downside here is it's still a dependency.
  2. Pass in the *int64 and every Randomize implementer can use the stdlib atomic package to get random values out of it. This is obviously bad because every implementer has to deal with this implementation detail and if they mess up we get race conditions.
  3. Pass in a single "random" integer and allow it to be segmented for many smaller random values. Bad things about this is the implementation is a lot of annoying >> bitshifts so you'd probably want to create a package/dependency for this anyway. And we can easily run out of randomness for certain things, like a char(255) column. So we'd have to repeatedly use the values in a loop which might actually work but it's a lot of code and again you'd probably want to create some functions to use for this which again reintroduces the dependency.
aarondl commented 6 years ago

@saulortega The more I thought about this. The more solution number 2 makes sense. Anything that's providing some types can re-implement this trivially. In Rob Pike's words: A little duplication is better than a little dependency. I vote we go for this approach, what do you think? This would allow the interface to use only Go types and not require a dependency on anything more than stdlib.

aarondl commented 6 years ago

FWIW the new interface would look like:

type Randomizer interface {
    Randomize(seed *int64, fieldType string, shouldBeNull bool)
}
saulortega commented 6 years ago

That looks better than it is now, and it is the best of the three options. :+1:

aarondl commented 6 years ago

@saulortega Since you seem to be active, I'll pose one more question. I actually thought we could fix the problem completely by passing the NextInt() function around. So it actually becomes:

type Randomizer interface {
    Randomize(nextInt func() int64, fieldType string, shouldBeNull bool)
}

This is better in that nobody needs to implement things using atomic, and it also eliminates the dependency. What do you think?

saulortega commented 6 years ago

Yes, that is still a good option. :+1:

aarondl commented 6 years ago

Change done in 84770d2. It's not in any rc just yet. Keep in mind I also dropped gopkg.in from the null package with this commit.