volatiletech / sqlboiler

Generate a Go ORM tailored to your database schema.
BSD 3-Clause "New" or "Revised" License
6.73k stars 544 forks source link

Infer inserts empty strings #540

Closed sidot3291 closed 5 years ago

sidot3291 commented 5 years ago

If you're having a generation problem please answer these questions before submitting your issue. Thanks!

What version of SQLBoiler are you using (sqlboiler --version)?

3.3.0

If this happened at runtime what code produced the issue? (if not applicable leave blank)

type Domain struct {
    ID                       string      `boil:"id" json:"id" toml:"id" yaml:"id"`
    Active                   bool        `boil:"active" json:"active" toml:"active" yaml:"active"`
    Name                     string      `boil:"name" json:"name" toml:"name" yaml:"name"`
    CreatedAt                time.Time   `boil:"created_at" json:"created_at" toml:"created_at" yaml:"created_at"`
    UpdatedAt                time.Time   `boil:"updated_at" json:"updated_at" toml:"updated_at" yaml:"updated_at"`
    DNSToken                 string      `boil:"dns_token" json:"dns_token" toml:"dns_token" yaml:"dns_token"`

    R *domainR `boil:"-" json:"-" toml:"-" yaml:"-"`
    L domainL  `boil:"-" json:"-" toml:"-" yaml:"-"`
}
    dmn := &models.Domain{
        Name:          creator.Name,
        Active:        true}}

    err := dmn.Insert(ctx, tx, boil.Infer())

Further information. What did you do, what did you expect?

I have my dns_token field set to NOT NULL, but this code succeeds because sqlboiler is passing in an empty string for dns_token.

I expected infer to omit the field from the insert, and therefore result in an error on insert (since it would implicitly be passing null).

I'm not sure if this is a bug or if it's expected. Happy to share more about the domains.go file, though I am splicing out some column information.

Thanks

aarondl commented 5 years ago

An empty string is a perfectly valid value for a string and warrants no special treatment from typical insertion code.

The problem you have is one of validation, and this is not a problem solved by sqlboiler. Though it can support these kinds of additional automatic checks through the use of hooks and either hand rolled code or some validation library.

You can also define checks in the database to prevent undesirable entries.

In summary this is intentional as this isn't a problem sqlboiler is attempting to solve. Thanks for taking the time to report the issue.

sidot3291 commented 5 years ago

@aarondl Thanks for your response

What's odd in this situation is that my column is implicitly defined as DEFAULT NULL, but sqlboiler is writing an empty string instead.

Sqlboiler performs as expected for any columns without NOT NULL, since those use the null package which appropriately defaults to NULL for inserts.

But when NOT NULL is defined, sqlboiler uses primitive types instead of the null package, and we lose the default.

We're restricted a bit by Go here. I believe NULL is a valid value to try and insert in a database even if the column is defined as NOT NULL, but have some hesitance to suggest that every field should use the null package instead of the primitive. What are your thoughts there?

I do think it would be good to update the documentation to better describe the behavior of boil.Infer() for NOT NULL columns. The readme made me think sqlboiler would not try to insert the empty string:

When you use inference sqlboiler looks at your Go struct field values and if the field value is the zero value it will not insert that field, instead it will get the value from the database. Keep in mind sqlboiler cannot read or understand your default values set in the database, so the Go zero value is what's important here (this can be especially troubling for default true bool fields). Use a whitelist or greylist in cases where you want to insert a Go zero value.

sidot3291 commented 5 years ago

Thinking more, shouldn't boil.Infer() not be inserting this column, and instead if somebody wants to insert empty strings they can Greylist() to ensure it's inserted?

aarondl commented 5 years ago

I have my dns_token field set to NOT NULL, but this code succeeds because sqlboiler is passing in an empty string for dns_token.

This is intentional and correct. A null value cannot be used because it's not a valid value for that column. It should not throw an error for an empty string, as I said before: that's a validation problem. Your string column 100% allows an empty string, it's a valid and normal value that can go into that column.

What's odd in this situation is that my column is implicitly defined as DEFAULT NULL, but sqlboiler is writing an empty string instead.

I'm a bit confused, is this a nullable column or not? Are we still talking about the same column? If it's true that it's nullable and has default null (which is contrary to all the information provided thus far) then there may be a bug in default column detection or otherwise. Please open a new bug for this if this is the case.

But when NOT NULL is defined, sqlboiler uses primitive types instead of the null package, and we lose the default.

The default is still there, that's most of the point of boil.Infer(), to determine if there's a Go 0-value in the struct, and if that column has a default value in the database to omit it from the insert, allowing the value to come back from the DB instead. There should be no "losing" the default value.

We're restricted a bit by Go here. I believe NULL is a valid value to try and insert in a database even if the column is defined as NOT NULL, but have some hesitance to suggest that every field should use the null package instead of the primitive. What are your thoughts there?

One of the primary concerns of sqlboiler is type safety and disallowing exactly these sorts of things. It is absolutely not ok to try to insert a null value into a column whose possible values do not include null and using null types for every column to allow bad values to flow from code into database is counter to the goals of sqlboiler. The null package is there to allow null values when a column is implicitly or explicitly nullable.

I can agree on updating the documentation to say something about default values in that excerpt you quoted.

Thinking more, shouldn't boil.Infer() not be inserting this column, and instead if somebody wants to insert empty strings they can Greylist() to ensure it's inserted?

This depends if there's a default value in the database for it or not. If it has a default, then no it should not insert an empty string, if it does not have a default, it will insert the empty string because there's no other possible thing it could do, a value must go in. Empty string is not a "weird" value, it's totally normal and some people use them instead of nulls and that's a perfectly valid use case. Null is the billion dollar mistake after all and should be avoided where possible. "Make impossible states impossible" a mantra borrowed from the Elm community is a very wise saying.

sidot3291 commented 5 years ago

I have many columns defined in Postgres as NOT NULL and DEFAULT NULL. This definition is valid and standard unless I define a separate default. It's just impossible for me to insert a NULL value into that column, since Postgres will throw an error at runtime.

In practice, this is very similar to defining mycol character varying NOT NULL and CHECK (mycol <> ''). I can't insert an empty string, but it's still a valid definition.

Sqlboiler is usually not opinionated about whether my validation is handled by the database or by code, except in the case of Infer() with NOT NULL columns. There, sqlboiler inserts an empty string, and forces me to define a CHECK if I want postgres to throw an error.

My confusion here ultimately stemmed from the documentation for Infer(). It says it will pass NULL if I have an empty string, and it's not clear there's an exception for NOT NULL columns. I don't believe it's correct to be differentiating on whether "there's a default value in the database," since I consider all columns to have a default, even if that default is null.

Hope this makes sense. I very much appreciate your responsiveness and all the work you've put in here - it really is leagues above everything else I've tried. This edge case is relatively minor and I can certainly workaround it with a check or validation, but ultimately would prefer if Infer() operated consistently even with NOT NULL columns.

Edit: This change wouldn't prevent Infer() from defaulting to an empty string for NOT NULL columns. It could still be accomplished by adding DEFAULT '' to the column definition.

aarondl commented 5 years ago

I genuinely don't understand what the purpose behind that column definition is for.

Not null, default null allows empty (but not null) strings as I expected:

what=# create table whats (w text not null default null);
CREATE TABLE
what=# insert into whats (w) values (null);
ERROR:  null value in column "w" violates not-null constraint
DETAIL:  Failing row contains (null).
what=# insert into whats (w) values ('');
INSERT 0 1

Not null allows empty strings (but not null) as I expected:

what=# create table whats (w text not null);
CREATE TABLE
what=# insert into whats (w) values (null);
ERROR:  null value in column "w" violates not-null constraint
DETAIL:  Failing row contains (null).
what=# insert into whats (w) values ('');
INSERT 0 1
what=# insert into whats default values;
ERROR:  null value in column "w" violates not-null constraint
DETAIL:  Failing row contains (null).

It seems you believe that the first one should not accept an empty string but that doesn't appear to be the case? Even in the second one I demonstrate that you cannot omit the value of w which might be something that default null is designed to protect against, so I'm struggling to understand purpose of the column definition not null default null.

Null is a very special case default value as it's the only default value that can be implicit. I don't think that it makes sense to view it like "all columns have a default value even if it's null" as an explicit default value is much different than the implicit one. In a typical database columns will have a default value that is not null, it's usually in the domain of acceptable values for that column type, and so it makes a lot of sense practically speaking to differentiate on an explicit default value existing or not.

I have updated the documentation as we discussed, but I remain unconvinced that anything needs changing here, especially in the light that not null default null does not seem to have any practical purpose. Please correct me if I'm wrong here, it's definitely possible my tests were missing an important case.

sidot3291 commented 5 years ago

I agree that NOT NULL DEFAULT NULL should accept an empty string. I did not intend to suggest otherwise.

I think we may differ on whether we believe NOT NULL is a form of validation or if it changes the data type.

In postgres (can't speak on others), setting NOT NULL is a constraint just like a CHECK [1]. A string column with NOT NULL is still a string column. Postgres doesn't have different types for NullableString and NonNullableString. All columns can hold null, regardless of the type - unless you define a constraint.

In Go there are no null values, just zero values.

I see sqlboiler's null package and the different insertion helpers purely as a mechanism to bridge this difference.

I don't like that, when sqlboiler sees a NOT NULL column in my database without an explicit default, it assumes I intend for it to default to Go's zero value. The way I perceive this is that sqlboiler is changing its behavior based on a validation I've defined on my database column. But sqlboiler doesn't do this for any other type of validation, just null checks, which I find surprising.

But if I instead perceive my NOT NULL constraint as a change in data type, sqlboiler's behavior makes some more sense to me. I suppose if postgres had NullableString and NonNullableString, the non-nullable variant might behave the same as go.

Ultimately, though, I'd prefer that sqlboiler didn't make this jump. For me, Infer()'s behavior is desirable even if it ends up trying to insert a value that will fail my null check. Catching mistakes is a large reason why I defined NOT NULL in the first place, and Infer()'s changing behavior served as a workaround for that validation. Now I have added an empty string CHECK to catch the same errors, but I would prefer if my database client hadn't brought about that change.

Hopefully this makes some sense, regardless of if it leads to changes in the code. Again, I'm a happy user either way. Thanks for your thoughtfulness throughout this thread.

[1] https://www.postgresql.org/docs/9.6/sql-createtable.html

aarondl commented 5 years ago

I see sqlboiler's null package and the different insertion helpers purely as a mechanism to bridge this difference.

This is exactly true and nothing's ever been said to the contrary. SQL database engines and Go have type disparity and so we have to bridge the gap in some way. There is no way to do this other than to use pointers to primitive types (*string) or custom types (null.String). We opted for custom types because it's much more performant.

I don't like that, when sqlboiler sees a NOT NULL column in my database without an explicit default, it assumes I intend for it to default to Go's zero value. The way I perceive this is that sqlboiler is changing its behavior based on a validation I've defined on my database column. But sqlboiler doesn't do this for any other type of validation, just null checks, which I find surprising.

This is incorrect; when it sees a not null column in your database without an explicit default, it assumes that whatever value is in the struct at the time of insertion (empty string or otherwise) is what goes into the database. That column isn't even considered by inference, it's not even looked at because there's simply no other value it could possibly choose to insert (other than the one you're giving it). It seems maybe you're arguing it should insert nothing if there was an empty string but that's also undesirable, because then you'd now have to create a mechanism to bypass that in case I wanted an empty string (which is a perfectly normal value). You'd have to make a case for why empty string should be considered a special value in the general case (not just for your particular schema) and I think you'll have a hard time doing so. Null on the other hand is a very obvious special case. If in doubt look for all the times the postgresql documentation says "if the value is null...".

I think the difference in our thinking lies here:

Ultimately, though, I'd prefer that sqlboiler didn't make this jump. For me, Infer()'s behavior is desirable even if it ends up trying to insert a value that will fail my null check. Catching mistakes is a large reason why I defined NOT NULL in the first place, and Infer()'s changing behavior served as a workaround for that validation. Now I have added an empty string CHECK to catch the same errors, but I would prefer if my database client hadn't brought about that change.

I think it's incredibly important to be able to agree that null and empty string are fundamentally different things and cannot be caught by the same check in a database. This is evident especially when you look at how check constraints work with null, you can pretend that they're similar or the same but it's just not true. You have to check for both things separately if you want to keep both things out of your database, that means creating a column which is not null as well as a check constraint. It's also true that I will not implement any fixes for not null default null as that's not only confusing but adds no value and therefore should be avoided in all database schemas.

Because I think we're losing focus in the conversation I'll go back to the original issue: SQLBoiler's inference is -only- for omitting columns that have default values (in order to bring the value back from the DB). Any column whose value cannot come from the database (ie no explicit default value) simply takes the value in the struct, even if that's an empty string. Which again; is a perfectly normal and natural value for a string. If you want to prevent empty strings from getting into the database you can set up validation in Go using sqlboiler's hook system and/or set up check constraints in the database.

To me there's no issue here, it's all working as intended and adding that check constraint to your database was 100% warranted as that's an invalid value in your schema, yet you had never explicitly stated that in said schema. You said that it's because of your current client (sqlboiler & Go) that you were forced to make that change, but in reality any client could connect to your database and insert all the empty strings it wants without that check. Relying on sqlboiler's inference to keep empty strings out of the database is not a good strategy so in my opinion your general requirements forced the change and it had nothing to do with the client.

I appreciate the civil discussion we've had. I come off as a little hard-cut/dry on the internet sometimes and I assure you I mean no disrespect nor meant to create any emotionally charged responses here if it at all came across in that manner. I also appreciate the very nice comments you've made about sqlboiler along the way and I hope that this difference in opinion does not sour your impression of it too much. Hopefully my responses make sense and we can agree to disagree on the behavior of inserts in sqlboiler.

sidot3291 commented 5 years ago

It seems maybe you're arguing it should insert nothing if there was an empty string but that's also undesirable, because then you'd now have to create a mechanism to bypass that in case I wanted an empty string (which is a perfectly normal value).

Yes, I'm saying it should omit the column from the insertion where Infer() is used as a way of passing NULL to the query - even though it can tell that query will fail validation.

I don't think it would require a new mechanism. Whitelist() and Greylist() already serve that purpose.

You'd have to make a case for why empty string should be considered a special value in the general case (not just for your particular schema) and I think you'll have a hard time doing so.

I believe the general case would still work as originally documented. If you want an empty string (or any type's zero value), you need to use whitelist or greylist, otherwise the column is omitted from the query. The not null definition would have no bearing.

Happy to agree to disagree. I think I understand your perspective here and can't argue there's a cut and dry answer. Very much appreciate the discussion as well!