pulsejet / memories

Fast, modern and advanced photo management suite. Runs as a Nextcloud app.
https://memories.gallery
GNU Affero General Public License v3.0
3.08k stars 82 forks source link

Geospatial index not used on MySQL 8 : Missing SRID #1067

Closed kedare closed 6 months ago

kedare commented 6 months ago

Describe the bug When running occ memories:places-setup , during the Recalculating places for all files phase.

The geospacial index is not used at all on MySQL, this is a query with its query plan

SELECT 
    `sub`.`osm_id`, `mp`.`admin_level`
FROM
    (SELECT DISTINCT
        (osm_id)
    FROM
        `memories_planet_geometry`
    WHERE
        ST_CONTAINS(geometry, ST_GEOMFROMTEXT('POINT(4.627736 43.676514)'))
    GROUP BY `poly_id` , `osm_id`
    HAVING SUM(type_id) > 0) `sub`
        INNER JOIN
    `oc_memories_planet` `mp` ON `sub`.`osm_id` = `mp`.`osm_id`
ORDER BY `mp`.`admin_level` ASC
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "840283.34"
    },
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "574551.00"
      },
      "nested_loop": [
        {
          "table": {
            "table_name": "sub",
            "access_type": "ALL",
            "rows_examined_per_scan": 574551,
            "rows_produced_per_join": 574551,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "7184.39",
              "eval_cost": "57455.10",
              "prefix_cost": "64639.49",
              "data_read_per_join": "8M"
            },
            "used_columns": [
              "osm_id"
            ],
            "materialized_from_subquery": {
              "using_temporary_table": true,
              "dependent": false,
              "cacheable": true,
              "query_block": {
                "select_id": 2,
                "cost_info": {
                  "query_cost": "63150.85"
                },
                "duplicates_removal": {
                  "using_filesort": false,
                  "grouping_operation": {
                    "using_temporary_table": true,
                    "using_filesort": false,
                    "table": {
                      "table_name": "memories_planet_geometry",
                      "access_type": "ALL",
                      "rows_examined_per_scan": 574551,
                      "rows_produced_per_join": 574551,
                      "filtered": "100.00",
                      "cost_info": {
                        "read_cost": "5695.75",
                        "eval_cost": "57455.10",
                        "prefix_cost": "63150.85",
                        "data_read_per_join": "153M"
                      },
                      "used_columns": [
                        "id",
                        "poly_id",
                        "type_id",
                        "osm_id",
                        "geometry"
                      ],
                      "attached_condition": "st_contains(`nextcloud`.`memories_planet_geometry`.`geometry`,<cache>(st_geomfromtext('POINT(4.627736 43.676514)')))"
                    }
                  }
                }
              }
            }
          }
        },
        {
          "table": {
            "table_name": "mp",
            "access_type": "ref",
            "possible_keys": [
              "memories_planet_osm_id_index"
            ],
            "key": "memories_planet_osm_id_index",
            "used_key_parts": [
              "osm_id"
            ],
            "key_length": "4",
            "ref": [
              "sub.osm_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 574551,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "143637.75",
              "eval_cost": "57455.10",
              "prefix_cost": "265732.34",
              "data_read_per_join": "21M"
            },
            "used_columns": [
              "id",
              "osm_id",
              "admin_level"
            ]
          }
        }
      ]
    }
  }
}

image

The query is quite slow for the data set (between 1s and 2s for this one)

To Reproduce Steps to reproduce the behavior:

Screenshots If applicable, add screenshots to help explain your problem.

Platform:

Additional context Add any other context about the problem here.

pulsejet commented 6 months ago

Does the index exist? I ran the exact same SQL on my instance and it uses the index and takes ~30ms

image image

kedare commented 6 months ago

it does, this is the exported CREATE TABLE

CREATE TABLE `memories_planet_geometry` (
  `id` varchar(32) NOT NULL,
  `poly_id` varchar(32) NOT NULL,
  `type_id` int NOT NULL,
  `osm_id` int NOT NULL,
  `geometry` polygon NOT NULL,
  PRIMARY KEY (`id`),
  KEY `planet_osm_id_idx` (`osm_id`),
  SPATIAL KEY `planet_osm_polygon_geometry_idx` (`geometry`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Using MySQL 8.0.36-28, I tried to drop and recreate the index but no change.

kedare commented 6 months ago

Also it looks like the queries/indexes are all using SRID=0 (default one when undefined) instead of SRID=4326, I don't know if this is expected ?

kedare commented 6 months ago

That may be related to that apparently : https://dba.stackexchange.com/questions/260757/mysql-8-not-using-spatial-index

Spatial columns with no SRID attribute are not SRID-restricted and accept values with any SRID. However, the optimizer cannot use SPATIAL indexes on them until the column definition is modified to include an SRID attribute, which may require that the column contents first be modified so that all values have the same SRID.

Maybe you are using an older version of MySQL that did not have this restriction ?

kedare commented 6 months ago

I confirm that when running the CREATE TABLE statement I get this warning

The spatial index on column 'geometry' will not be used by the query optimizer since the column does not have an SRID attribute. Consider adding an SRID attribute to the column.

pulsejet commented 6 months ago

@kedare can you try the patch in https://github.com/pulsejet/memories/commit/2effc74d6e53e6210dd4de71af2b9df5db5de14a and report back?

kedare commented 6 months ago

What would be the process to execute ? I can just delete the existing table, recreate it with your code and rerun the occ command ? (I don't think it's possible to convert existing data to use the new SRID, so probably a new a new empty table)

pulsejet commented 6 months ago

Just re-run the occ command, it drops the table anyway

kedare commented 6 months ago

Testing now, I do get a lot of those errors when running the command, I don't think this is expected.

ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -174.527400 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -175.862600 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -173.541000 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -179.148800 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -178.355400 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -175.994900 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_2 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -180.000000 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
pulsejet commented 6 months ago

Okay try https://github.com/pulsejet/memories/commit/00f1a97bc0e99753aa52927904d0b1ca7b836fc5

You just need the changes in Places.php

pulsejet commented 6 months ago

Just tested this with MySQL 8 and seems be fixed now. Query itself takes ~14ms (honestly it's surprisingly slow; Postgres takes ~1ms albeit that's just the native geometry types)

/usr/sbin/mysqld (mysqld 8.0.32)
mysql> EXPLAIN ANALYZE SELECT
    ->     `sub`.`osm_id`, `mp`.`admin_level`
    -> FROM
    ->     (SELECT DISTINCT
    ->         (osm_id)
    ->     FROM
    ->         `memories_planet_geometry`
    ->     WHERE
    ->         ST_CONTAINS(geometry, ST_GEOMFROMTEXT('POINT(43.676514 4.627736)', 4326))
    ->     GROUP BY `poly_id` , `osm_id`
    ->     HAVING SUM(type_id) > 0) `sub`
    ->         INNER JOIN
    ->     `oc_memories_planet` `mp` ON `sub`.`osm_id` = `mp`.`osm_id`
    -> ORDER BY `mp`.`admin_level` ASC;

| -> Sort: mp.admin_level  (actual time=13.078..13.078 rows=7 loops=1)
    -> Stream results  (cost=13171.87 rows=0) (actual time=12.981..13.070 rows=7 loops=1)
        -> Nested loop inner join  (cost=13171.87 rows=0) (actual time=12.979..13.066 rows=7 loops=1)
            -> Table scan on sub  (cost=2.50..2.50 rows=0) (actual time=12.949..12.951 rows=7 loops=1)
                -> Materialize  (cost=0.00..0.00 rows=0) (actual time=12.949..12.949 rows=7 loops=1)
                    -> Sort with duplicate removal: memories_planet_geometry.osm_id  (actual time=12.937..12.938 rows=7 loops=1)
                        -> Filter: (sum(memories_planet_geometry.type_id) > 0)  (actual time=12.917..12.921 rows=7 loops=1)
                            -> Table scan on <temporary>  (actual time=12.912..12.914 rows=7 loops=1)
                                -> Aggregate using temporary table  (actual time=12.910..12.910 rows=7 loops=1)
                                    -> Filter: st_contains(memories_planet_geometry.`geometry`,<cache>(st_geomfromtext('POINT(43.676514 4.627736)',4326)))  (cost=1.90 rows=1) (actual time=0.761..12.867 rows=7 loops=1)
                                        -> Index range scan on memories_planet_geometry using planet_osm_polygon_geometry_idx over (geometry unprintable_geometry_value)  (cost=1.90 rows=1) (actual time=0.535..7.354 rows=10 loops=1)
            -> Index lookup on mp using memories_planet_osm_id_index (osm_id=sub.osm_id)  (cost=8189.18 rows=32090) (actual time=0.015..0.016 rows=1 loops=7)
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000192 |
| Executing hook on transaction  | 0.000009 |
| starting                       | 0.000011 |
| checking permissions           | 0.000006 |
| checking permissions           | 0.000006 |
| Opening tables                 | 0.000117 |
| init                           | 0.000010 |
| System lock                    | 0.000013 |
| optimizing                     | 0.000006 |
| optimizing                     | 0.000013 |
| statistics                     | 0.000135 |
| preparing                      | 0.000021 |
| Creating tmp table             | 0.000053 |
| statistics                     | 0.000035 |
| preparing                      | 0.000010 |
| Creating tmp table             | 0.000034 |
| executing                      | 0.013060 |
| end                            | 0.000009 |
| query end                      | 0.000005 |
| waiting for handler commit     | 0.000024 |
| closing tables                 | 0.000013 |
| freeing items                  | 0.000026 |
| cleaning up                    | 0.000014 |
+--------------------------------+----------+
kedare commented 6 months ago

Is there an easy way to install without modifying all the files one by one ? (As I see the last commit had many files) (like can I install the app directly from the github master branch ?)

Postgis is better is almost any way to MySQL GIS so I am not surprised, not an area where MySQL shines unfortunately.

EDIT: Ok no I will try just with Places.php as you said

kedare commented 6 months ago

I still see a lot of

ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -174.527400 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -175.862600 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -173.541000 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -179.148800 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -178.355400 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_1 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -175.994900 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].
ERROR: Failed to insert polygon -19557_2 (An exception occurred while executing a query: SQLSTATE[22S03]: <<Unknown error>>: 3617 Latitude -180.000000 is out of range in function st_geomfromtext. It must be within [-90.000000, 90.000000].`

with the Places.php from 00f1a97bc0e99753aa52927904d0b1ca7b836fc5

pulsejet commented 6 months ago

Are you sure your file is right? Just copy the places.php from master, then maybe restart the php server to clear opcache. The return "{$lat} {$lon}"; are the critical parts.

kedare commented 6 months ago

My bad I did a wget to the raw file and it saved it as Places.php.1 πŸ˜… Importing and so far it's looking good.

kedare commented 6 months ago

Finished, I confirm you that indexing the 5000 photos takes now just a few seconds instead of more than 30 minutes :)

Thank you very much for your time πŸ™πŸΌ

pulsejet commented 6 months ago

Awesome! Thanks for the very detailed report; I believe this could have been the root cause for some other bugs too + turns out Postgres had a similar issue (though not as bad).

kedare commented 6 months ago

@pulsejet I think you may also need to specify SRID for PostgreSQL counterpart ? If not the system will assume a "flat earth" that may locate incorrectly (not 100% sure, I never really used GIS before)

pulsejet commented 6 months ago

I think you may also need to specify SRID for PostgreSQL counterpart ?

Since postgis may not always be available, we use the native geometry types, which does not have any concept of SRID. Small errors are acceptable anyway.

kedare commented 6 months ago

Oh ok, good then, I didn't know postgresql supported geospatial data without PostGIS

pulsejet commented 6 months ago

I didn't know postgresql supported geospatial data without PostGIS

It doesn't, but the geometry types kinda get the job done (https://www.postgresql.org/docs/current/datatype-geometric.html)