Closed twiddler closed 4 months ago
I suspect it's because your schema contains bigints or some other unsigned 64 bit integer which isn't representable in JavaScript.
We have an example specifically for doing your own casting to a JS BigInt type. https://github.com/planetscale/database-js?tab=readme-ov-file#custom-type-casting-function
The proposal to use BigInts directly was rejected. https://github.com/planetscale/database-js/pull/90
https://github.com/planetscale/database-js?tab=readme-ov-file#custom-type-casting-function
As linked above, you can use a custom casting function for addressing this. Rather than forcing usage of BigInt, you can opt into it.
@mattrobenolt There's no schema involved. I am doing
SELECT * FROM
(
SELECT 0 AS "num_column"
UNION ALL
SELECT 0 AS "num_column"
) AS "foo"
(Sorry if that wasn't clear from the code. I originally posted this on kysely-planetscale.)
What puzzles me is that the type changes depending on how many lines the result set has. It'd be fine if it was always BigInts passed as strings. But if it's sometimes strings, sometimes numbers it makes it really hard to trace errors.
Oh, I did miss that. Lemme see what MySQL is doing here.
Ok, so.
$ curl -s -u $MYSQL_USER:$MYSQL_PWD https://$MYSQL_HOST/psdb.v1alpha1.Database/Execute -H'content-type: application/json' -d'{"query": "SELECT * FROM ( SELECT 0 AS `num_column` ) AS `foo`"}' | jq .result.fields[0]
{
"name": "num_column",
"type": "INT32",
"table": "foo",
"orgName": "num_column",
"columnLength": 2,
"charset": 63,
"flags": 32769
}
$ curl -s -u $MYSQL_USER:$MYSQL_PWD https://$MYSQL_HOST/psdb.v1alpha1.Database/Execute -H'content-type: application/json' -d'{"query": "SELECT * FROM ( SELECT 0 AS `num_column` UNION ALL SELECT 0 AS `num_column` ) AS `foo`"}' | jq .result.fields[0]
{
"name": "num_column",
"type": "INT64",
"table": "foo",
"orgName": "num_column",
"columnLength": 2,
"charset": 63,
"flags": 32769
}
When selecting 1 row, the type we get back is INT32, and when 2 rows, it's INT64.
This is... very interesting. I'm sure it's clear to you now at least why we behave this way in the driver, since an INT32 is safely an integer, but INT64, we need to keep it as a string or convert to BigInt, but either way, we're getting a different type.
I'm going to dig in internally, this might be more of a Vitess this? I'm not sure why it'd behave slightly differently.
Just to be explicit, the queries I'm testing are:
SELECT * FROM (
SELECT 0 AS `num_column`
) AS `foo`
and
SELECT * FROM (
SELECT 0 AS `num_column`
UNION ALL
SELECT 0 AS `num_column`
) AS `foo`
Ok, so this is pretty wild. Against a vanilla mysqld 8.0.34, this is what happens:
mysql> SELECT * FROM ( SELECT 0 AS `num_column` ) AS `foo`;
Field 1: `num_column`
Catalog: `def`
Database: ``
Table: `foo`
Org_table: ``
Type: LONG
Collation: binary (63)
Length: 2
Max_length: 1
Decimals: 0
Flags: NOT_NULL NUM
+------------+
| num_column |
+------------+
| 0 |
+------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM ( SELECT 0 AS `num_column` UNION ALL SELECT 0 AS `num_column` ) AS `foo`;
Field 1: `num_column`
Catalog: `def`
Database: ``
Table: `foo`
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 2
Max_length: 1
Decimals: 0
Flags: NOT_NULL NUM
+------------+
| num_column |
+------------+
| 0 |
| 0 |
+------------+
2 rows in set (0.01 sec)
I can't explain this behavior, but clearly MySQL wants to make the second query LONGLONG and the first just a LONG. Which is 64bit vs 32bit integer.
So both PlanetScale and database-js are handling this query correctly to how MySQL behaves.
I think in this case, the query is just too ambiguous and might benefit from a specific CAST()
to yield the correct type, but I sorta suspect this query is just trying to exercise some behavior since this isn't something I think you'd do in practice?
But whatever is happening, we are handling this correctly according to how MySQL behaves.
If anything, the best we'd be able to do here is handle this type as a BigInt
with a custom cast function, but then they'd still be different types. One as a normal JS number and other as a BigInt.
this isn't something I think you'd do in practice?
That rather synthetic example is the essence of the following use case:
I am left joining two tables A
and B
. Since it's a left join, there might be rows in the result set that have NULL
values for B.foo
. For those rows, I want to know whether B.foo
is NULL
because there was 1) no match, or 2) B.foo
is actually NULL
. I solved it like this:
SELECT
B.foo,
B.aId IS NOT NULL as matchFound
FROM
A LEFT JOIN B
ON
A.id = B.aId;
The IS NOT NULL
evaluates to either 0 or 1, which--contrary to what I expected but just like the query in my first post--gets handed back as a number if there's just one row, and strings when there's more rows.
I think PlanetScale should not do any extra lifting here, but just behave like vanilla MySQL. It didn't for me in my example. I tested against community-mysql-server
version 8.0.36 (see https://docs.fedoraproject.org/en-US/quick-docs/installing-mysql-mariadb/).
This example is now different though than what you first produced. The first examples yield a "derived table" which is what's behaving oddly.
Can you give me a schema that you're working with for this latter example actually doing a LEFT JOIN? I'm curious to see the data types Vitess is returning vs MySQL. It might be that we're doing something wrong here.
I tried to reproduce with the following:
CREATE TABLE A (
id BIGINT UNSIGNED AUTO_INCREMENT,
PRIMARY KEY(id)
);
CREATE TABLE B (
id BIGINT UNSIGNED AUTO_INCREMENT,
aId BIGINT UNSIGNED,
foo TEXT,
PRIMARY KEY(id),
KEY(aId)
);
INSERT INTO A VALUES (NULL);
INSERT INTO A VALUES (NULL);
INSERT INTO B VALUES (NULL, 1, 'hello');
SELECT
B.foo,
B.aId IS NOT NULL AS matchFound
FROM
A LEFT JOIN B
ON
A.id = B.aId;
This setup seems to mirror what you're trying to accomplish.
Against PlanetScale, I get:
mysql> SELECT
-> B.foo,
-> B.aId IS NOT NULL AS matchFound
-> FROM
-> A LEFT JOIN B
-> ON
-> A.id = B.aId;
Field 1: `foo`
Catalog: `def`
Database: `mattdb`
Table: `B`
Org_table: `B`
Type: BLOB
Collation: utf8mb4_0900_ai_ci (255)
Length: 262140
Max_length: 5
Decimals: 0
Flags: BLOB
Field 2: `matchFound`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 1
Max_length: 1
Decimals: 0
Flags: NOT_NULL BINARY NUM
And against vanilla MySQL 8.0.34:
mysql> SELECT
-> B.foo,
-> B.aId IS NOT NULL AS matchFound
-> FROM
-> A LEFT JOIN B
-> ON
-> A.id = B.aId;
Field 1: `foo`
Catalog: `def`
Database: `mysql`
Table: `B`
Org_table: `B`
Type: BLOB
Collation: utf8mb4_general_ci (45)
Length: 262140
Max_length: 5
Decimals: 0
Flags: BLOB
Field 2: `matchFound`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 1
Max_length: 1
Decimals: 0
Flags: NOT_NULL BINARY NUM
+-------+------------+
| foo | matchFound |
+-------+------------+
| hello | 1 |
| NULL | 0 |
+-------+------------+
2 rows in set (0.00 sec)
If I delete a row, so there's only 1:
--- PlanetScale
mysql> delete from A where id=2;
Query OK, 1 row affected (0.11 sec)
mysql> SELECT
-> B.foo,
-> B.aId IS NOT NULL AS matchFound
-> FROM
-> A LEFT JOIN B
-> ON
-> A.id = B.aId;
Field 1: `foo`
Catalog: `def`
Database: `mattdb`
Table: `B`
Org_table: `B`
Type: BLOB
Collation: utf8mb4_0900_ai_ci (255)
Length: 262140
Max_length: 5
Decimals: 0
Flags: BLOB
Field 2: `matchFound`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 1
Max_length: 1
Decimals: 0
Flags: NOT_NULL BINARY NUM
+-------+------------+
| foo | matchFound |
+-------+------------+
| hello | 1 |
+-------+------------+
1 row in set (0.10 sec)
--- MySQL 8.0.34
mysql> delete from A where id=2;
Query OK, 1 row affected (0.02 sec)
mysql> SELECT
-> B.foo,
-> B.aId IS NOT NULL AS matchFound
-> FROM
-> A LEFT JOIN B
-> ON
-> A.id = B.aId;
Field 1: `foo`
Catalog: `def`
Database: `mysql`
Table: `B`
Org_table: `B`
Type: BLOB
Collation: utf8mb4_general_ci (45)
Length: 262140
Max_length: 5
Decimals: 0
Flags: BLOB
Field 2: `matchFound`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 1
Max_length: 1
Decimals: 0
Flags: NOT_NULL BINARY NUM
+-------+------------+
| foo | matchFound |
+-------+------------+
| hello | 1 |
+-------+------------+
1 row in set (0.00 sec)
In all cases, matchFound
is a LONGLONG
which is a 64 bit integer.
Am I missing something here? This would yield a string in all cases through our driver.
@twiddler one thing to watch for: I think the mysql2
client is going to type-cast values as either numbers or strings based on the length of the string returned, basically if the string value of the number is short enough, it'll automatically parse it into a number:
There's some pool options that control that behavior (e.g. setting bigNumberStrings: true
will mean that the LONGLONG
always gets returned as a string), but that might be the source of the difference here.
Yeah, I had considered something like this where we convert to an integer if we know it's within the safe bounds, but hard to say objectively what's better, because then you just get a string at some point when you've crossed the threshold and your code doesn't anticipate that.
I think in this case tho, the answer is probably supplying a custom cast function that handles this how you like. But from my understanding of everything here, the types are the same on the protocol level. We just choose to handle LONGLONG types a strings and hands-off on guessing what to do with it, while mysql2 has knobs to tune the behavior.
I think this is my complete understanding at this point.
Thanks for all your hard work, and sorry for getting back so late. :pray:
the mysql2 client is going to type-cast values as either numbers or strings based on the length of the string returned
That would have explained the behaviour, but even with
pool: createPool({
database: "mysql",
host: "localhost",
user: "root",
password: "",
+ bigNumberStrings: true,
}),
the tests in my opening post still pass:
▶ mysql
✔ one row (30.003698ms)
✔ two rows (1.971407ms)
▶ mysql (32.916212ms)
Expecting strings fails:
- assert.deepStrictEqual(q, [{ num_column: 0 }, { num_column: 0 }]);
+ assert.deepStrictEqual(q, [{ num_column: "0" }, { num_column: "0" }]);
▶ mysql
✔ one row (33.345367ms)
✖ two rows (5.137187ms)
AssertionError [ERR_ASSERTION]: Expected values to be strictly deep-equal:
Can you give me a schema that you're working with for this latter example actually doing a LEFT JOIN?
CREATE TABLE `A` (
`id` varchar(191) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE InnoDB,
CHARSET utf8mb4,
COLLATE utf8mb4_0900_ai_ci;
CREATE TABLE `B` (
`id` varchar(191) NOT NULL,
`aId` varchar(191) NOT NULL,
`foo` enum('inherit', 'true', 'false') NOT NULL DEFAULT 'inherit',
PRIMARY KEY (`id`, `aId`)
) ENGINE InnoDB,
CHARSET utf8mb4,
COLLATE utf8mb4_0900_ai_ci;
And the query (for convenience, same as above):
SELECT
B.foo,
B.aId IS NOT NULL as matchFound
FROM
A LEFT JOIN B
ON
A.id = B.aId;
So while I am still confused why mysql2 didn't behave as expected when bigNumberStrings: true
, I am sure that when it comes to PlanetScale I can solve my use case with a custom cast function. (In my application, I see no reason to just Number(...)
any numeric value and then assert it is <= Number.MAX_SAFE_INTEGER
--which is ridiculously huge anyway.)
I can't figure out why MySQL treats IS NOT NULL
as an INT64, but that question is certainly out of scope here. CAST
does not accept any smaller number type, so I'll select B.aId
and z.transform
null
to false
and everything else to true
.
Thanks for all your input. This was definitely an interesting ride. :roller_coaster:
Yeah, I can't explain why MySQL behaves that way either. Or why mysql2 infers this. But they definitely do extra work to try and coerce numbers. Maybe it's something we can explore too? But that seems like it'd have unexpected behaviors too. When you do cross the boundary, now you get a string.
I can easily see the case where you have a bigint primary key, everything is working for a long time. Eventually you cross the threshold and now you get strings back and code is broken. I'm not sure that'd be a good experience either.
I'd personally recommend, like you suggested, a custom cast function or something for you to explicitly handle this how you want to.
I think we as a library are handling this as correctly as we can in JavaScript until we have some evidence otherwise.
In the following test suite, when using
PlanetScaleDialect
and receiving more than one row, what I expected to be numbers is parsed as strings instead:Gist for reproducing: https://gist.github.com/twiddler/8f98b48ec61c75eaa5422249c1edebc9. Please set
PLANETSCALE_URL
in.env
and use Node 21 or later.Is this expected behavior?