theory / pg-semver

A semantic version data type for PostgreSQL
Other
145 stars 24 forks source link

postgres error: mergejoin input data is out of order #66

Closed imranur-rahman closed 4 months ago

imranur-rahman commented 1 year ago

[I asked this question to stackoverflow: https://stackoverflow.com/q/76988029/3450691]

I have two tables in my postgres database, relations and versioninfo:

postgres=# \d+ relations
                                                                 Table "public.relations"
                Column                |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------------------------------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 system_name                          | character varying           |           |          |         | extended |             |              | 
 from_package_name                    | character varying           |           |          |         | extended |             |              | 
 from_version                         | semver                      |           |          |         | plain    |             |              | 
 to_package_name                      | character varying           |           |          |         | extended |             |              | 
 actual_requirement                   | character varying           |           |          |         | extended |             |              | 
 to_version                           | semver                      |           |          |         | plain    |             |              | 
 to_package_highest_available_release | character varying           |           |          |         | extended |             |              | 
 interval_start                       | timestamp without time zone |           |          |         | plain    |             |              | 
 interval_end                         | timestamp without time zone |           |          |         | plain    |             |              | 
 is_out_of_date                       | boolean                     |           |          |         | plain    |             |              | 
 is_regular                           | boolean                     |           |          |         | plain    |             |              | 
 warnings                             | character varying           |           |          |         | extended |             |              | 
Access method: heap
postgres=# \d+ versioninfo
                                                    Table "public.versioninfo"
    Column    |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 system_name  | character varying           |           |          |         | extended |             |              | 
 package_name | character varying           |           |          |         | extended |             |              | 
 version_name | semver                      |           |          |         | plain    |             |              | 
 release_date | timestamp without time zone |           |          |         | plain    |             |              | 
Access method: heap

And I am joining them on some criteria:

CREATE TABLE exclude_records AS
SELECT R2.system_name, R2.from_package_name, R2.from_version, R2.to_package_name, COUNT(*) AS count_no
FROM relations R2
INNER JOIN versioninfo V3
ON R2.from_package_name = V3.package_name AND R2.from_version = V3.version_name AND R2.system_name = V3.system_name
INNER JOIN versioninfo V4
ON R2.to_package_name = V4.package_name AND R2.to_version = V4.version_name AND R2.system_name = V4.system_name
GROUP BY R2.system_name, R2.from_package_name, R2.from_version, R2.to_package_name
HAVING COUNT(*) > 1;

But this query returns error:

ERROR:  mergejoin input data is out of order 

SQL state: XX000

The error was gone when I used from_version, to_version, and version_name to text in my query, e.g. . . . AND R2.from_version::text = v3.version_name::text. . . and again at R2.to_version::text = V4.version_name::text.

The person who commented on my post and I, both agreed that there might be some issues with the current implementation of SEMVER which is causing this error.

What do you think?

theory commented 1 year ago

I just ran this script on Postgres 15.4 with semver v0.32.1 with no error:

BEGIN;

CREATE EXTENSION semver;

CREATE TABLE relations (
    system_name TEXT,
    from_package_name TEXT,
    from_version SEMVER,
    to_package_name TEXT,
    to_version SEMVER
);

CREATE TABLE versioninfo (
    system_name TEXT,
    package_name TEXT,
    version_name SEMVER,
    release_date TIMESTAMPTZ
);

INSERT INTO relations
VALUES ('sysname', 'from_name', '1.0.0', 'to_name', '1.2.0')
     , ('sysname', 'from_name', '1.2.0', 'to_name', '1.3.0')
;

INSERT INTO versioninfo
VALUES ('sysname', 'from_name', '1.0.0', NOW())
     , ('sysname', 'to_name', '1.2.0', NOW())
     , ('sysname', 'from_name', '1.0.0', NOW())
     , ('sysname', 'to_name', '1.2.0', NOW())
;

-- CREATE TABLE exclude_records AS
SELECT R2.system_name, R2.from_package_name, R2.from_version, R2.to_package_name, COUNT(*) AS count_no
FROM relations R2
INNER JOIN versioninfo V3
ON R2.from_package_name = V3.package_name AND R2.from_version = V3.version_name AND R2.system_name = V3.system_name
INNER JOIN versioninfo V4
ON R2.to_package_name = V4.package_name AND R2.to_version = V4.version_name AND R2.system_name = V4.system_name
GROUP BY R2.system_name, R2.from_package_name, R2.from_version, R2.to_package_name
HAVING COUNT(*) > 1;

ROLLBACK;

Can you modify it to reproduce the issue?