Open jbhoot opened 1 month ago
I am happy to help you track down and fix this bug if you can verify that this is indeed a bug, and not some misunderstanding on my part.
I think I found out an explanation for this problem.
I will quote MySQL documentation to try to explain it.
I have used DataGrip to make queries.
MySQL uses an internal binary format to store geometric data as explained here, which is a sequence of bytes in the following order:
SRID|EndianByte|GeometryType|XCoord|YCoord
where:
node-mysql2
read a stored geometric value?For a query like select point from testpoint
to retrieve a stored Point()
, node-mysql2 parses the sequence of bytes as explained above. It stores the XCoord in x
, and YCoord in y
.
{
x: XCoord,
y: YCoord
}
Thus, XCoord, i.e., the first 8-bytes are read into x
, while YCoord, i.e., the second 8-bytes segment is read into y
.
The code can be found here.
XCoord
and YCoord
in the storage format?MySQL 5 stores a Point
in the same order as specified in a query. MySQL 5 docs don't explicitly say that, but we can verify it.
-- In MySQL 5.7
> select ST_GeomFromText('POINT(1 -1)', 4326);
0xE61000000101000000000000000000F03F000000000000F0BF
-- SRID|EndianByte|GeometryType|1|-1
-- 0x: Hex
-- E6 10 00 00: SRID. 4326 in this case.
-- 01 => ByteOrder. Each segment is in little-endian format
-- 01 00 00 00 => Geometry Type. Point in this case, POINT(1 -1).
-- XCoord => 00 00 00 00 00 00 F0 3F => 1, the first value in POINT(1 -1).
-- YCoord => 00 00 00 00 00 00 F0 BF => -1, the second value in POINT(1 -1).
> select ST_GeomFromText('POINT(-1 1)', 4326);
0xE61000000101000000000000000000F0BF000000000000F03F
-- SRID|EndianByte|GeometryType|-1|1
-- XCoord => 00 00 00 00 00 00 F0 BF => -1, the first value in POINT(-1 1).
-- YCoord => 00 00 00 00 00 00 F0 3F => 1, the second value in POINT(-1 1).
MySQL 5 also stores SRID, but ignores its semantics. So, user has to assume and stick to a single semantic meaning. We chose to use lat-long ordering in our project.
So for ST_GeomFromText('POINT(-1 1)', 4326)
, where user assumes the order of lat-long, -1 maps to latitude maps to XCoord, 1 maps to longitude maps to YCoord.
When node-mysql2 reads this value, it maps XCoord
to x
, and YCoord
to y
:
{
x: -1, // => XCoord => latitude
y: 1, // => YCoord => longitude
}
XCoord
and YCoord
in the storage format?In MySQL 8, storage doesn't always follow the same order as in a query. We will focus mainly on geographic co-ordinates, which include those in SRID 4326.
For SRID 4326, MySQL 8 defines lat-long ordering to interpret a geographic co-ordinate. So, in a ST_GeomFromText('POINT(1 -1)', 4326)
, lat is 1, while long is -1, or { lat: 1, long: -1 }
. We usually map this as lat->x, long->y, i.e., { x: 1, y: -1}
.
BUT, in order to store a co-ordinate, MySQL 8 provides the following description:
Geographic coordinates are stored in the angle unit of the spatial reference system, with longitudes in the X coordinates and latitudes in the Y coordinates. Axis directions and the meridian are those specified by the spatial reference system.
Thus, MySQL 8 stores longitude in XCoord, and latitude in YCoord, making the storage format SRID|EndianByte|GeometryType|XCoord_Long|YCoord_Lat
.
-- In MySQL 8
select ST_GeomFromText('POINT(1 -1)', 4326);
0xE61000000101000000000000000000F0BF000000000000F03F
-- SRID|EndianByte|GeometryType|-1|1
-- XCoord => 00 00 00 00 00 00 F0 BF => -1, the longitude value in POINT(-1 1).
-- YCoord => 00 00 00 00 00 00 F0 3F => 1, the latitude value in POINT(-1 1).
select ST_GeomFromText('POINT(-1 1)', 4326);
0xE61000000101000000000000000000F03F000000000000F0BF
-- SRID|EndianByte|GeometryType|1|-1
-- XCoord => 00 00 00 00 00 00 F0 3F => 1, the longitude value in POINT(-1 1).
-- YCoord => 00 00 00 00 00 00 F0 BF => -1, the latitude value in POINT(-1 1).
So for ST_GeomFromText('POINT(-1 1)', 4326)
, where MySQL uses the order of lat-long, -1 maps to latitude maps to YCoord, 1 maps to longitude maps to XCoord.
This translates into node-mysql2
mapping XCoord=longitude to x, and YCoord=latitude to y:
{
x: 1 // => XCoord => longitude,
y: -1 // => YCoord => latitude
}
Right. I need to go for a walk.
I am not sure anymore whether this can qualify as a bug for node-mysql2.
In MySQL 8, the semantics of XCoord and YCoord have changed, or rather gotten stronger for a subset of geometric values – SRID 4326 among them. XCoord now explicitly maps to longitude, and YCoord maps to latitude.
But the behaviour of node-mysql2 is still the same. It still maps XCoord to x
and YCoord to y
as before. It just that, due to the change in semantics as explained above, x now maps to longitude and y to latitude.
But, due to all this churn, the ground (lat-long) has surely shifted beneath the user. Either the library-level (node-mysql2) or the application-level code has to be modified to reflect the new state. I will wait for a couple of days to see if anyone chimes in. Otherwise, I wil adapt to this problem through a change at the application-level code.
I looked into what other drivers do with a geometric Point.
jdbc provides only the option to retrieve the byte array that represents a stored Point. Its parsing is left to the user.
Now, in case of MySQL 8, this byte array stores first longitude, then latitude. So, if it is parsed such that the first byte segment is read into x
, and the second into y
, then we end up with {x: long, y: lat}
. The following sample script in Scala proves it: https://github.com/justbhoot/poc-buggy-geographic-point-parsing-by-node-mysql2/blob/main/jdbc-analysis/ (you can just skim the README.md at this link).
It does the same thing as jdbc – it does not parse the retrieved byte array and return it to the caller as-is. The parsing is left to the user.
I have updated the main
branch of https://github.com/justbhoot/poc-buggy-geographic-point-parsing-by-node-mysql2/ to also show results from this connector, apart from the parsed results from node-mysql2
.
When node-mysql2
is asked to retrieve a geometric Point, it returns a parsed {x, y}
instead of a byte array. This becomes a problem for a geographic Point, in which the order is swapped {x: long, y: lat}
due to how MySQL8 stores a geographic Point:
Cartesian coordinates are stored in the length unit of the spatial reference system, with X values in the X coordinates and Y values in the Y coordinates. Axis directions are those specified by the spatial reference system. Geographic coordinates are stored in the angle unit of the spatial reference system, with longitudes in the X coordinates and latitudes in the Y coordinates. Axis directions and the meridian are those specified by the spatial reference system.
On the other hand, other drivers have simply adopted a hands-off approach, preferring to return the byte array of a geometric Point as-is instead of parsing it before returning it. If this byte array were to be parsed in the manner similar to how node-mysql2
does, then we will get the same mistaken {x: long, y: lat}
. The recommendation is to simply use MySQL's functions st_x()
, st_y()
, newer st_latitude()
, st_longitude()
, to avoid parsing.
I believe that, because node-mysql2
parses a Point beforehand, it should either fix its parsing for a geographic Point for MySQL8, or simply return the byte array instead of parsing it.
For now, I have chosen to use st_x()
, st_y()
, which does the right job for all use cases in my combo of node-mysql2 + MySQL 5 instance + MySQL 8 instance.
Problem
node-mysql2 seems to swap x and y values of a geographic Point stored in mysql8 when it is retrieved through node-mysql2 without special parsing.
Repository that demonstrates the bug
https://github.com/justbhoot/poc-buggy-geographic-point-parsing-by-node-mysql2/
How to reproduce
feat/demo
.db.sql
exists..env
file according to the.env.template
.npm ci
MYSQL_VERSION=8 node index.js
.The last step above – script execution – should produce the following output:
Expected result
For all the rows:
x
column in the second table should show the value inst_x
column of first table.y
column in the second table should show the value inst_y
column of first table.To demonstrate the same from MySQL CLI Shell:
In the above query result,
st_x()
andst_latitude()
point to the first value in the Point, which is in accordance with how SRID 4326 is defined in MySQL.Actual result
For the first row 0 (containing data for SRID 4326):
x
column in the second table shows the value inst_y
column of first table.y
column in the second table shows the value inst_x
column of first table.In other words, node-msyql2 apparently swaps the values for x and y in a retrieved geographic point.
Relevant observations
node-mysql2 behaves as expected, i.e., no swapped x and y, for a point with SRID 0.
node-mysql2 also behaves correctly for both SRIDs in mysql5 (probably because mysql5 ignores SRID anyway).