Open mitar opened 7 months ago
I suspect there are unresolvable ambiguities regarding string
<> bytea
.
Writing the string to a PostgreSQL bytea at least appears to work because pgx sends any string
to PostgreSQL in the text format without any conversion. pgx does the same thing when reading. Any text formatted value can be scanned directly into a string
without any conversion.
But what happens when you read the value? In the text format, PostgreSQL will return the value hex encoded, but in binary you will get back the raw bytes.
For example: if you write the 4 bytes jack
to a bytea you will get back the ten bytes \x6a61636b
in the text format but the original 4 bytes in binary format.
Even working directly in psql it can be confusing dealing with strings in bytea
. I recommend that you instead use text
or varchar
if possible.
My understanding is that I could register my own type codec, but I could not find much information about how to an existing codec: I would like that default bytea handling stays the same, only that strings are additionally scanned without copying.
If you really need/want to do this, you probably don't have to create a new Codec
. I think you would only need to create your own type that implements pgtype.BytesScanner
and pgtype.BytesValuer
.
Thanks for the response.
If you really need/want to do this, you probably don't have to create a new Codec. I think you would only need to create your own type that implements pgtype.BytesScanner and pgtype.BytesValuer.
But that would require one to scan into this custom type, not string itself.
For string itself, it seems like I would have to unregister ByteaCodec
and register a custom one? Am I mistaken? Or is there an easier way? Maybe just using a TryWrapScanPlanFuncs
?
I am just thinking that since Go 1.20, there is cheap way to convert between string
and []byte
:
func String2ByteSlice(str string) []byte {
return unsafe.Slice(unsafe.StringData(str), len(str))
}
func ByteSlice2String(bs []byte) string {
if len(bs) == 0 {
return ""
}
return unsafe.String(unsafe.SliceData(bs), len(bs))
}
So maybe pgx could just simply convert between them as necessary.
For string itself, it seems like I would have to unregister ByteaCodec and register a custom one? Am I mistaken? Or is there an easier way? Maybe just using a TryWrapScanPlanFuncs?
Yeah, either of these is a possibility, though you will still have the ambiguity of text vs. binary protocol.
I am just thinking that since Go 1.20, there is cheap way to convert between string and []byte:
I don't really see where these could be safely used. The read buffer is reused for every row. The unsafely constructed string would be corrupted by future rows.
I don't really see where these could be safely used. The read buffer is reused for every row. The unsafely constructed string would be corrupted by future rows.
Oh, you are right, for scanning it does not work.
Little late to the party but this work for me @mitar
var pw []byte
err := DBPool.QueryRow(context.Background(), fmt.Sprintf(query, user_table_quoted), email).Scan(&pw)
return string(pw), err
@vphatfla Yes, I would like that it works for Go string values, not Go []byte values (which already works).
@mitar, oh I see, I'm noob here, might I ask how expensive it is to repeatedly convert from []byte to string and vice versa?
With recent changes in Go it is very cheap. But the issue here is about ergonomics, if pgx does that internally, then it can do it a) efficiently b) without developer having to do it manually every time.
But isn't it make more sense to force developers to convert the string to []byte before saving to the postgres instead? My thinking is that, the data type in postgres is BYTEA, so logically, it should expect the []byte to be saved and retrieved instead of string. Doing it this way might help the new developers have a clear view of the system and thus, make fewer bugs in the future.
Sure. This is a design question for the library. It storing string into bytea an error or a feature. This is why the issue exists so that the community and maintainers can discuss the design here.
My view is that I would prefer that this is done by library automatically. Generally developers should have good understanding what is the column type and what is data type they are storing and library should try to accommodate and make this easy (especially when it is (now) possible to do it efficiently) to keep boilerplate and repeated code down.
My view is that I would prefer that this is done by library automatically. Generally developers should have good understanding what is the column type and what is data type they are storing and library should try to accommodate and make this easy (especially when it is (now) possible to do it efficiently) to keep boilerplate and repeated code down.
I agree, if it can be done safely. But as I mentioned above (https://github.com/jackc/pgx/issues/1977#issuecomment-2053856959), I don't know how it can be done without possibility of silently scanning the data incorrectly.
Is your feature request related to a problem? Please describe.
With recent Go improvements to convert strings from and to []byte without copying I would like storing strings into bytea column works transparently, with as little copying as possible.
I am writing a key-value store on top of PostgreSQL and I would like to simplify storing of built-in strings so that users of my library would not have to convert them to []byte.
Describe the solution you'd like
It looks like storing a string Go value works already, but reading back fails with:
I do not see much reason why scanning bytea into string would not just work for standard types with pgx? Go allows conversion between string and []byte with simple casting, I would hope that storing into a string into the database would work the same.
Describe alternatives you've considered
My understanding is that I could register my own type codec, but I could not find much information about how to an existing codec: I would like that default bytea handling stays the same, only that strings are additionally scanned without copying.