sidorares / node-mysql2

:zap: fast mysqljs/mysql compatible mysql driver for node.js
https://sidorares.github.io/node-mysql2/
MIT License
4.04k stars 610 forks source link

[Bug?] node-mysql2 seems to swap `x` and `y` values of a geographic Point (SRID 4326) stored in mysql8 #2959

Open jbhoot opened 1 month ago

jbhoot commented 1 month ago

Problem

node-mysql2 seems to swap x and y values of a geographic Point stored in mysql8 when it is retrieved through node-mysql2 without special parsing.

Repository that demonstrates the bug

https://github.com/justbhoot/poc-buggy-geographic-point-parsing-by-node-mysql2/

How to reproduce

  1. Ensure that a MySQL8 instance is running.
  2. Clone the above mentioned repository.
  3. Switch to branch feat/demo.
  4. Ensure that a schema as described in db.sql exists.
  5. Set up a .env file according to the .env.template.
  6. npm ci
  7. Run the index.js script MYSQL_VERSION=8 node index.js.

The last step above – script execution – should produce the following output:

$ MYSQL_VERSION=8 node index.js

Testing with MySQL v8...

AS INTERPRETED BY MYSQL8's ST_X, ST_Y functions:
┌─────────┬─────────────┬────────────────────┬──────────────┐
│ (index) │  st_x(pos)  │     st_y(pos)      │ st_srid(pos) │
├─────────┼─────────────┼────────────────────┼──────────────┤
│    0    │ -33.3109317 │ 117.34617329999999 │     4326     │
│    1    │ -35.3109317 │    101.3461733     │      0       │
└─────────┴─────────────┴────────────────────┴──────────────┘

AS INTERPRETED BY node-mysql2:
┌─────────┬─────────────┬─────────────┐
│ (index) │      x      │      y      │
├─────────┼─────────────┼─────────────┤
│    0    │ 117.3461733 │ -33.3109317 │
│    1    │ -35.3109317 │ 101.3461733 │
└─────────┴─────────────┴─────────────┘

Expected result

For all the rows:

To demonstrate the same from MySQL CLI Shell:

mysql> select st_astext(pos), st_x(pos), st_latitude(pos) from testpoint where st_srid(pos) = 4326;

+--------------------------------+-------------+------------------+
| st_astext(pos)                 | st_x(pos)   | st_latitude(pos) |
+--------------------------------+-------------+------------------+
| POINT(-33.3109317 117.3461733) | -33.3109317 |      -33.3109317 |
+--------------------------------+-------------+------------------+
1 row in set (0.00 sec)

In the above query result, st_x() and st_latitude() point to the first value in the Point, which is in accordance with how SRID 4326 is defined in MySQL.

Actual result

For the first row 0 (containing data for SRID 4326):

In other words, node-msyql2 apparently swaps the values for x and y in a retrieved geographic point.

AS INTERPRETED BY MYSQL8's ST_X, ST_Y functions:
┌─────────┬─────────────┬────────────────────┬──────────────┐
│ (index) │  st_x(pos)  │     st_y(pos)      │ st_srid(pos) │
├─────────┼─────────────┼────────────────────┼──────────────┤
│    0    │ -33.3109317 │ 117.34617329999999 │     4326     │ <---- correct: x is lat, y is long
└─────────┴─────────────┴────────────────────┴──────────────┘

AS INTERPRETED BY node-mysql2:
┌─────────┬─────────────┬─────────────┐
│ (index) │      x      │      y      │
├─────────┼─────────────┼─────────────┤
│    0    │ 117.3461733 │ -33.3109317 │ <---- incorrect: x is long, y is lat
└─────────┴─────────────┴─────────────┘

Relevant observations

node-mysql2 behaves as expected, i.e., no swapped x and y, for a point with SRID 0.

node-mysql2 also behaves correctly for both SRIDs in mysql5 (probably because mysql5 ignores SRID anyway).

jbhoot commented 1 month ago

I am happy to help you track down and fix this bug if you can verify that this is indeed a bug, and not some misunderstanding on my part.

jbhoot commented 1 month ago

I think I found out an explanation for this problem.

I will quote MySQL documentation to try to explain it.

I have used DataGrip to make queries.

How does MySQL (5 and 8 at least) store a geometric value?

MySQL uses an internal binary format to store geometric data as explained here, which is a sequence of bytes in the following order:

SRID|EndianByte|GeometryType|XCoord|YCoord

where:

  1. SRID: 4 bytes that store SRID value
  2. EndianByte: 1 byte that indicates the byte order – little-endian or big-endian. MySQL uses only little-endian here anyway. All the segments in a geometric value are stored in little-endian order.
  3. GeometryType: 4 bytes to store the geometry type – Point, Line, etc.
  4. XCoord: 8-bytes in double-precision, little-endian format to store what MySQL deems to be the x co-ordinate. More on the emphasised part later.
  5. YCoord: 8-bytes in double-precision, little-endian format to store what MySQL deems to be the y co-ordinate.
Side-note: Interpreting hex representation of coords values ### Side-note: Interpreting hex representation of coords values The hex representation of coords values look a bit unconventional because they are stored in little-endian, 8-byte, double-precision format. For example, `-1` is probably represented as `-1.0 x 2^0`, from which its sign, biased exponent, and fraction are stored in a Coord segment. Thus, `-1` ends up being represented as `00 00 00 00 00 00 F0 3F`. I might be off in the explanation, but the details are not relevant to our topic.

How does node-mysql2 read a stored geometric value?

For a query like select point from testpoint to retrieve a stored Point(), node-mysql2 parses the sequence of bytes as explained above. It stores the XCoord in x, and YCoord in y.

{
    x: XCoord,
    y: YCoord
}

Thus, XCoord, i.e., the first 8-bytes are read into x, while YCoord, i.e., the second 8-bytes segment is read into y.

The code can be found here.

How does MySQL 5 decide XCoord and YCoord in the storage format?

MySQL 5 stores a Point in the same order as specified in a query. MySQL 5 docs don't explicitly say that, but we can verify it.

-- In MySQL 5.7
> select ST_GeomFromText('POINT(1 -1)', 4326);
0xE61000000101000000000000000000F03F000000000000F0BF
-- SRID|EndianByte|GeometryType|1|-1
-- 0x: Hex
-- E6 10 00 00: SRID. 4326 in this case.
-- 01 => ByteOrder. Each segment is in little-endian format
-- 01 00 00 00 => Geometry Type. Point in this case, POINT(1 -1).
-- XCoord => 00 00 00 00 00 00 F0 3F => 1, the first value in POINT(1 -1).
-- YCoord => 00 00 00 00 00 00 F0 BF => -1, the second value in POINT(1 -1).

> select ST_GeomFromText('POINT(-1 1)', 4326);
0xE61000000101000000000000000000F0BF000000000000F03F
-- SRID|EndianByte|GeometryType|-1|1
-- XCoord => 00 00 00 00 00 00 F0 BF => -1, the first value in POINT(-1 1).
-- YCoord => 00 00 00 00 00 00 F0 3F => 1, the second value in POINT(-1 1).

MySQL 5 also stores SRID, but ignores its semantics. So, user has to assume and stick to a single semantic meaning. We chose to use lat-long ordering in our project.

So for ST_GeomFromText('POINT(-1 1)', 4326), where user assumes the order of lat-long, -1 maps to latitude maps to XCoord, 1 maps to longitude maps to YCoord.

When node-mysql2 reads this value, it maps XCoord to x, and YCoord to y:

{
    x: -1, // => XCoord => latitude
    y: 1, // => YCoord => longitude
}

How does MySQL 8 decide XCoord and YCoord in the storage format?

In MySQL 8, storage doesn't always follow the same order as in a query. We will focus mainly on geographic co-ordinates, which include those in SRID 4326.

MySQL 8 - storing geographic co-ordinates

For SRID 4326, MySQL 8 defines lat-long ordering to interpret a geographic co-ordinate. So, in a ST_GeomFromText('POINT(1 -1)', 4326), lat is 1, while long is -1, or { lat: 1, long: -1 }. We usually map this as lat->x, long->y, i.e., { x: 1, y: -1}.

BUT, in order to store a co-ordinate, MySQL 8 provides the following description:

Geographic coordinates are stored in the angle unit of the spatial reference system, with longitudes in the X coordinates and latitudes in the Y coordinates. Axis directions and the meridian are those specified by the spatial reference system.

Thus, MySQL 8 stores longitude in XCoord, and latitude in YCoord, making the storage format SRID|EndianByte|GeometryType|XCoord_Long|YCoord_Lat.

-- In MySQL 8
select ST_GeomFromText('POINT(1 -1)', 4326);
0xE61000000101000000000000000000F0BF000000000000F03F
-- SRID|EndianByte|GeometryType|-1|1
-- XCoord => 00 00 00 00 00 00 F0 BF => -1, the longitude value in POINT(-1 1).
-- YCoord => 00 00 00 00 00 00 F0 3F => 1, the latitude value in POINT(-1 1).

select ST_GeomFromText('POINT(-1 1)', 4326);
0xE61000000101000000000000000000F03F000000000000F0BF
-- SRID|EndianByte|GeometryType|1|-1
-- XCoord => 00 00 00 00 00 00 F0 3F => 1, the longitude value in POINT(-1 1).
-- YCoord => 00 00 00 00 00 00 F0 BF => -1, the latitude value in POINT(-1 1).

So for ST_GeomFromText('POINT(-1 1)', 4326), where MySQL uses the order of lat-long, -1 maps to latitude maps to YCoord, 1 maps to longitude maps to XCoord.

This translates into node-mysql2 mapping XCoord=longitude to x, and YCoord=latitude to y:

{
    x: 1 // => XCoord => longitude,
    y: -1 // => YCoord => latitude
}
Side-note: MySQL 8 - storing non-geographic cartesian co-ordinates ### Side-note: MySQL 8 - storing non-geographic cartesian co-ordinates For *non-geographic* cartesian co-ordinates, this is what the [MySQL 8 documentation says]() (emphasis mine): > Cartesian coordinates are stored in the length unit of the spatial reference system, **with X values in the X coordinates and Y values in the Y coordinates**. Axis directions are those specified by the spatial reference system. So, this behaviour is similar to the one in MySQL 5. ```sql -- In MySQL 8 select ST_GeomFromText('POINT(1 -1)', 0); 0x000000000101000000000000000000F03F000000000000F0BF -- SRID|EndianByte|GeometryType|1|-1 -- XCoord => 00 00 00 00 00 00 F0 3F => 1, the first value in POINT(1 -1). -- YCoord => 00 00 00 00 00 00 F0 BF => -1, the second value in POINT(1 -1). select ST_GeomFromText('POINT(-1 1)', 0); 0x000000000101000000000000000000F0BF000000000000F03F -- SRID|EndianByte|GeometryType|-1|1 -- XCoord => 00 00 00 00 00 00 F0 BF => -1, the first value in POINT(-1 1). -- YCoord => 00 00 00 00 00 00 F0 3F => 1, the second value in POINT(-1 1). ```
jbhoot commented 1 month ago

Right. I need to go for a walk.

jbhoot commented 1 month ago

I am not sure anymore whether this can qualify as a bug for node-mysql2.

In MySQL 8, the semantics of XCoord and YCoord have changed, or rather gotten stronger for a subset of geometric values – SRID 4326 among them. XCoord now explicitly maps to longitude, and YCoord maps to latitude.

But the behaviour of node-mysql2 is still the same. It still maps XCoord to x and YCoord to y as before. It just that, due to the change in semantics as explained above, x now maps to longitude and y to latitude.

But, due to all this churn, the ground (lat-long) has surely shifted beneath the user. Either the library-level (node-mysql2) or the application-level code has to be modified to reflect the new state. I will wait for a couple of days to see if anyone chimes in. Otherwise, I wil adapt to this problem through a change at the application-level code.

jbhoot commented 3 weeks ago

I looked into what other drivers do with a geometric Point.

MySQL's jdbc connector

jdbc provides only the option to retrieve the byte array that represents a stored Point. Its parsing is left to the user.

Now, in case of MySQL 8, this byte array stores first longitude, then latitude. So, if it is parsed such that the first byte segment is read into x, and the second into y, then we end up with {x: long, y: lat}. The following sample script in Scala proves it: https://github.com/justbhoot/poc-buggy-geographic-point-parsing-by-node-mysql2/blob/main/jdbc-analysis/ (you can just skim the README.md at this link).

MySQL's official nodejs connector

It does the same thing as jdbc – it does not parse the retrieved byte array and return it to the caller as-is. The parsing is left to the user.

I have updated the main branch of https://github.com/justbhoot/poc-buggy-geographic-point-parsing-by-node-mysql2/ to also show results from this connector, apart from the parsed results from node-mysql2.

My Conclusion

When node-mysql2 is asked to retrieve a geometric Point, it returns a parsed {x, y} instead of a byte array. This becomes a problem for a geographic Point, in which the order is swapped {x: long, y: lat} due to how MySQL8 stores a geographic Point:

Cartesian coordinates are stored in the length unit of the spatial reference system, with X values in the X coordinates and Y values in the Y coordinates. Axis directions are those specified by the spatial reference system. Geographic coordinates are stored in the angle unit of the spatial reference system, with longitudes in the X coordinates and latitudes in the Y coordinates. Axis directions and the meridian are those specified by the spatial reference system.

On the other hand, other drivers have simply adopted a hands-off approach, preferring to return the byte array of a geometric Point as-is instead of parsing it before returning it. If this byte array were to be parsed in the manner similar to how node-mysql2 does, then we will get the same mistaken {x: long, y: lat}. The recommendation is to simply use MySQL's functions st_x(), st_y(), newer st_latitude(), st_longitude(), to avoid parsing.

I believe that, because node-mysql2 parses a Point beforehand, it should either fix its parsing for a geographic Point for MySQL8, or simply return the byte array instead of parsing it.

For now, I have chosen to use st_x(), st_y(), which does the right job for all use cases in my combo of node-mysql2 + MySQL 5 instance + MySQL 8 instance.