Database index `fs_storage_path_prefix` is not created with Postgresql

RegisPerdreau commented 1 year ago

⚠️ This issue respects the following points: ⚠️

[X] This is a bug, not a question or a configuration/webserver/proxy issue.
[X] This issue is not already reported on Github (I've searched it).
[X] Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
[X] Nextcloud Server is running on 64bit capable CPU, PHP and OS.
[X] I agree to follow Nextcloud's Code of Conduct.

Bug description

About PostgreSQL support.

Considering this line of code : https://github.com/nextcloud/server/blob/master/core/Command/Db/AddMissingIndices.php#L175

When installing Nextcloud, database index is not created with postgresql database to avoid any later issues (according this commit: https://github.com/nextcloud/server/commit/695326534c18749f36e9172e7b7345824213ee60

According to our research, the missing index is in the oc_filecache table.

With MySQL/Mariadb in database, the index is indeed created by Nextcloud and is called fs_storage_path_prefix. But he is therefore not created with PostgreSQL

We think that performance issue could occur specially when Cronjob is running with huge database.

Could Nextcloud consider solving this annoying issue ?

The commit talks about extra parameters in dbal driver, could we have any explanations about ?

Expected behavior

Expect to have database index created.

Nextcloud Server version

25

Database engine version

PostgreSQL

szaimen commented 1 year ago

Hi, Nextcloud 24 is not supported anymore.

RegisPerdreau commented 1 year ago

Sorry, i have updated, same problem with 25 and higher

szaimen commented 1 year ago

Cc @icewind1991 can we drop the check for postgres in the mentioned line or will it bug out?

pielonet commented 1 year ago

Hi,

PostgreSQL does not support field length limitations in Indexes, thus Doctrine/DBAL won't use it for this platform.

The "lengths" options in the following command would not be taken into account which would have a disastrous effect on Postgresql since indexes on paths would not have any limit and could exceed database capability.

$table->addIndex(['storage', 'path'], 'fs_storage_path_prefix', [], ['lengths' => [null, 64]]);

This index is able to optimize requests in this form (used for example by Nextcloud's cron job) :

select * from oc_filecache where storage=2 and path like 'files_versions%';

Nevertheless PostgreSQL supports Indexes that can optimize requests with "LIKE" expressions (See : https://www.postgresql.org/docs/12/indexes-opclass.html):

CREATE INDEX fs_storage_path_prefix ON oc_filecache (storage, path varchar_pattern_ops);

Note that "path" field is of type VARCHAR(4000) and that the B-tree version 4 limitation is 2704 bytes for the column row (https://dba.stackexchange.com/questions/69161/character-varying-index-overhead-length-limit/69164)

Thus I recommend running the following request prior to creating the Index :

select max(octet_length(path)) from oc_filecache;

The result has to be way less than 2704 in order to assure that the index can be created properly and that future INSERT/UPDATE commands in oc_filecache table won't reach the 2704 limit for "path" field length. (The "path" field is that of files in Nextcloud's files tree: e.g :"files/subdirectory1/subdirectory2/my_file.pdf"). Beware that users use several depths and very long subdirectory names !

Kind regards, Thierry

pielonet commented 1 year ago

I updated my first comment as It was incorrect : in order to optimize requests with "Like" expressions one can not use an index with a "left(path, 64)" limitation. This index will simply be ignored by the optimizer if the request uses a "LIKE" comparison (See : https://www.postgresql.org/docs/current/indexes-expressional.html)

icewind1991 commented 1 year ago

See https://github.com/nextcloud/server/pull/28541#issuecomment-906363838 for why this index isn't created for pgsql atm

flaf commented 1 year ago

Hi,

I updated my first comment as It was incorrect : in order to optimize requests with "Like" expressions one can not use an index with a "left(path, 64)" limitation. This index will simply be ignored by the optimizer if the request uses a "LIKE" comparison (See : https://www.postgresql.org/docs/current/indexes-expressional.html)

According to my tests in staging plateform, this index (with PostgreSQL 12) is used by the optimizer:

CREATE INDEX TEST_storage_path_prefix ON oc_filecache (storage, left(path,64));

For instance:

dbnccl01=> EXPLAIN SELECT * FROM "oc_filecache" "file" LEFT JOIN "oc_filecache_extended" "fe" ON "file"."fileid" = "fe"."fileid" WHERE ("mimetype" <> 2) AND (("storage" = 13954) AND (("path_hash" = '9692aae50022f45f1098646939b287b1') OR ("path" LIKE 'files_versions/%')));
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Nested Loop Left Join  (cost=0.42..4.87 rows=1 width=300)
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   ->  Index Scan using fs_storage_size on oc_filecache file  (cost=0.27..2.50 rows=1 width=178)
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         Index Cond: (storage = 13954)
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         Filter: ((mimetype <> 2) AND (((path_hash)::text = '9692aae50022f45f1098646939b287b1'::text) OR ((path)::text ~~ 'files_versions/%'::text)))
-[ RECORD 5 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   ->  Index Scan using oc_filecache_extended_pkey on oc_filecache_extended fe  (cost=0.15..2.37 rows=1 width=122)
-[ RECORD 6 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         Index Cond: (fileid = file.fileid)

dbnccl01=> CREATE INDEX test_storage_path_prefix ON oc_filecache (storage, left(path,64));
CREATE INDEX

dbnccl01=> EXPLAIN SELECT * FROM "oc_filecache" "file" LEFT JOIN "oc_filecache_extended" "fe" ON "file"."fileid" = "fe"."fileid" WHERE ("mimetype" <> 2) AND (("storage" = 13954) AND (("path_hash" = '9692aae50022f45f1098646939b287b1') OR ("path" LIKE 'files_versions/%')));
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Nested Loop Left Join  (cost=0.42..4.87 rows=1 width=300)
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   ->  Index Scan using test_storage_path_prefix on oc_filecache file  (cost=0.27..2.50 rows=1 width=178)
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         Index Cond: (storage = 13954)
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         Filter: ((mimetype <> 2) AND (((path_hash)::text = '9692aae50022f45f1098646939b287b1'::text) OR ((path)::text ~~ 'files_versions/%'::text)))
-[ RECORD 5 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   ->  Index Scan using oc_filecache_extended_pkey on oc_filecache_extended fe  (cost=0.15..2.37 rows=1 width=122)
-[ RECORD 6 ]----------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         Index Cond: (fileid = file.fileid)
dbnccl01=>

The index is used, correct?

So:

1) Is it OK to create manually this index in production? What about the next upgrades of Nextcloud? Can we have some conflict etc.?

2) If it's OK, which name I choose for this index? fs_storage_path_prefix like in PHP code of Nextcloud? Or is it better to choose a different name?

szaimen commented 1 year ago

Cc @nextcloud/server-backend

icewind1991 commented 1 year ago

Note that with pgsql we don't need to to only use the prefix for the index, this is done for mysql because of mysql's index side limit.

Creating the index manually and naming it fs_storage_path_prefix should be fine.

Note that if your index isn't using C locale, that you need to set the "operator class" for the index.

CREATE INDEX fs_storage_path_prefix ON oc_filecache (storage, path text_pattern_ops);

pielonet commented 1 year ago

@flaf

CREATE INDEX TEST_storage_path_prefix ON oc_filecache (storage, left(path,64)); The index is used, correct?

Well, the index is used but only partly : the first field of the index, "storage", is used (as indicated in the sentence in the explain plan : QUERY PLAN | Index Cond: (storage = 13954)) but not the second field "path" which is simply filtered as stated in the explain plan : QUERY PLAN | Filter: ((mimetype <> 2) AND (((path_hash)::text = '9692aae50022f45f1098646939b287b1'::text) OR ((path)::text ~~ 'files_versions/%'::text)))

So the optimization is far from optimal in this case. As I wrote before, the only way to optimize requests like that one on both storage and path fields: select * from oc_filecache where storage=2 and path like 'files_versions%';

Is to use such an index : CREATE INDEX fs_storage_path_prefix ON oc_filecache (storage, path varchar_pattern_ops); like I created on this table:

\d oc_filecache;
                                           Table "public.oc_filecache"
      Column      |          Type           | Collation | Nullable |                   Default                    
------------------+-------------------------+-----------+----------+----------------------------------------------
 fileid           | bigint                  |           | not null | nextval('oc_filecache_fileid_seq'::regclass)
 storage          | bigint                  |           | not null | 0
 path             | character varying(4000) |           |          | NULL::character varying
 path_hash        | character varying(32)   |           | not null | ''::character varying
 parent           | bigint                  |           | not null | 0
 name             | character varying(250)  |           |          | NULL::character varying
 mimetype         | bigint                  |           | not null | 0
 mimepart         | bigint                  |           | not null | 0
 size             | bigint                  |           | not null | 0
 mtime            | bigint                  |           | not null | 0
 storage_mtime    | bigint                  |           | not null | 0
 encrypted        | integer                 |           | not null | 0
 unencrypted_size | bigint                  |           | not null | 0
 etag             | character varying(40)   |           |          | NULL::character varying
 permissions      | integer                 |           |          | 0
 checksum         | character varying(255)  |           |          | NULL::character varying
Indexes:
    "oc_filecache_pkey" PRIMARY KEY, btree (fileid)
    "fs_id_storage_size" btree (fileid, storage, size)
    "fs_mtime" btree (mtime)
    "fs_parent" btree (parent)
    "fs_parent_name_hash" btree (parent, name)
    "fs_size" btree (size)
    "fs_storage_mimepart" btree (storage, mimepart)
    "fs_storage_mimetype" btree (storage, mimetype)
    "fs_storage_path_hash" UNIQUE, btree (storage, path_hash)
    "fs_storage_path_prefix" btree (storage, path varchar_pattern_ops)
    "fs_storage_size" btree (storage, size, fileid)

Here is the corresponding explain plan:

explain select * from oc_filecache where storage=1 and path like 'files/%';
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Index Scan using fs_storage_path_prefix on oc_filecache  (cost=0.28..14.79 rows=4 width=187)
   Index Cond: ((storage = 1) AND ((path)::text ~>=~ 'files/'::text) AND ((path)::text ~<~ 'files0'::text))
   Filter: ((path)::text ~~ 'files/%'::text)
(3 rows)

As you can see both fields "storage" and "path" appear in the "Index Cond" part of the plan and contribute to the optimization of the request.

flaf commented 1 year ago

Hi,

@icewind1991, @pielonet, thanks for your explanations.

Firstly, I have made a mistake in my example because I have not escaped the _ character in my request (ie "path" LIKE 'files\_versions/%' instead of "path" LIKE 'files_versions/%'). Then, indeed the index with left(path, 64) is used partly and the index with varchar_pattern_ops seems to be required according to this new test in staging plateform:

dbnccl01=> CREATE INDEX test1_storage_path_prefix ON oc_filecache (storage, left(path,64));
CREATE INDEX

# The index test1_storage_path_prefix not used. :(
dbnccl01=> EXPLAIN SELECT * FROM oc_filecache WHERE "storage" = 1 AND path LIKE 'files/%';
                           QUERY PLAN                            
-----------------------------------------------------------------
 Seq Scan on oc_filecache  (cost=0.00..16.01 rows=26 width=179)
   Filter: (((path)::text ~~ 'files/%'::text) AND (storage = 1))
(2 rows)

dbnccl01=> CREATE INDEX test2_storage_path_prefix ON oc_filecache (storage, path varchar_pattern_ops);
CREATE INDEX

# The index test2_storage_path_prefix is used. \o/
dbnccl01=> EXPLAIN SELECT * FROM oc_filecache WHERE "storage" = 1 AND path LIKE 'files/%';
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Index Scan using test2_storage_path_prefix on oc_filecache  (cost=0.27..13.25 rows=26 width=179)
   Index Cond: ((storage = 1) AND ((path)::text ~>=~ 'files/'::text) AND ((path)::text ~<~ 'files0'::text))
   Filter: ((path)::text ~~ 'files/%'::text)
(3 rows)

I would have preferred the index with left(path, 64) to work because it's smaller than the index with varchar_pattern_ops. Never mind, that's life.

But, if we create the index manually (the index with varchar_pattern_ops), can we be certain that, when the problem is solved on the Nextcloud side (on the dbal side in fact), exactly the same index will be natively created by Nextcloud with the same name? Indeed, from Nextcloud upgrade to Nextcloud upgrade in production, I don't want to end up with 2 duplicate indexes, or with a different index from the one created natively by Nextcloud?

PS: message edited (typo, sorry).

icewind1991 commented 1 year ago

Any index with that name for pgsql might not be strictly the same but will be functionally the same, so creating the index as described shouldn't lead to issues

flaf commented 1 year ago

Hi.

We have created the index in production and it's better but now we can see another slow requests like this:

dbnccl03=> SELECT * FROM pg_stat_activity WHERE pid = 2020288\gx 
-[ RECORD 1 ]----+---------------------------------------------------------
datid            | 16402
datname          | dbnccl03
pid              | 2020288
usesysid         | 16398
usename          | nccl03
application_name | 
client_addr      | 10.242.20.47
client_hostname  | 
client_port      | 59430
backend_start    | 2023-06-20 20:50:02.112051+02
xact_start       | 2023-06-20 21:42:44.148936+02
query_start      | 2023-06-20 21:42:44.149045+02
state_change     | 2023-06-20 21:42:44.149045+02
wait_event_type  | IO
wait_event       | DataFileRead
state            | active
backend_xid      | 
backend_xmin     | 2656486196
query            | SELECT COUNT(*) FROM "oc_filecache" WHERE "storage" = $1
backend_type     | client backend

Is it normal? Is there another missing index?

Just for memory, now we have this:

dbnccl03=> \d oc_filecache;
                                           Table "nccl03.oc_filecache"
      Column      |          Type           | Collation | Nullable |                   Default                    
------------------+-------------------------+-----------+----------+----------------------------------------------
 fileid           | bigint                  |           | not null | nextval('oc_filecache_fileid_seq'::regclass)
 storage          | bigint                  |           | not null | 0
 path             | character varying(4000) |           |          | NULL::character varying
 path_hash        | character varying(32)   |           | not null | ''::character varying
 parent           | bigint                  |           | not null | 0
 name             | character varying(250)  |           |          | NULL::character varying
 mimetype         | bigint                  |           | not null | 0
 mimepart         | bigint                  |           | not null | 0
 size             | bigint                  |           | not null | 0
 mtime            | bigint                  |           | not null | 0
 storage_mtime    | bigint                  |           | not null | 0
 encrypted        | integer                 |           | not null | 0
 unencrypted_size | bigint                  |           | not null | 0
 etag             | character varying(40)   |           |          | NULL::character varying
 permissions      | integer                 |           |          | 0
 checksum         | character varying(255)  |           |          | NULL::character varying
Indexes:
    "oc_filecache_pkey" PRIMARY KEY, btree (fileid)
    "fs_storage_path_hash" UNIQUE, btree (storage, path_hash)
    "fs_id_storage_size" btree (fileid, storage, size)
    "fs_mtime" btree (mtime)
    "fs_parent" btree (parent)
    "fs_parent_name_hash" btree (parent, name)
    "fs_size" btree (size)
    "fs_storage_mimepart" btree (storage, mimepart)
    "fs_storage_mimetype" btree (storage, mimetype)
    "fs_storage_size" btree (storage, size, fileid)
    "tmp_workaround_nc_issue_38483" btree (storage, path varchar_pattern_ops) # <= The index manually added.

flaf commented 1 year ago

I wanted to add the number of entries (just an approximation) in the oc_filecache table in our production server:

dbnccl03=> SELECT reltuples AS estimation FROM pg_catalog.pg_class WHERE relname = 'oc_filecache';
  estimation   
---------------
 6.7777504e+07    ### <== ~67 millions
(1 row)

pielonet commented 1 year ago

I guess an index on 'storage' field' alone is missing although several multi-columns indexes already exist with field 'storage' being first field of the index that could be used (see https://www.postgresql.org/docs/current/indexes-multicolumn.html) : fs_storage_path_hash, fs_storage_mimepart, fs_storage_mimetypeor or fs_storage_size

I guess you should run an ANALYZE command on oc_filecache table so that Postgresql gathers statistics on that table. You would enhance the probability that one of the above mentioned indexes be used.

Whenever you have significantly altered the distribution of data within a table, running ANALYZE is strongly recommended. This includes bulk loading large amounts of data into the table. Running ANALYZE (or VACUUM ANALYZE) ensures that the planner has up-to-date statistics about the table. With no statistics or obsolete statistics, the planner might make poor decisions during query planning, leading to poor performance on any tables with inaccurate or nonexistent statistics. Note that if the autovacuum daemon is enabled, it might run ANALYZE automatically; see Section 25.1.3 and Section 25.1.6 for more information.

For example, you could run: EXPLAIN SELECT COUNT(*) FROM "oc_filecache" WHERE "storage" = <my_storage>

Then : ANALYZE oc_filecache or ANALYZE oc_filecache (storage)

And finally rerun: EXPLAIN SELECT COUNT(*) FROM "oc_filecache" WHERE "storage" = <my_storage>

To see if it made a difference

Please read the docs for ANALYZE : https://www.postgresql.org/docs/current/sql-analyze.html

flaf commented 1 year ago

Hi, @pielonet, I have try this:

dbnccl03=> EXPLAIN SELECT COUNT(*) FROM "oc_filecache" WHERE "storage" = 117;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=5977.56..5977.57 rows=1 width=8)
   ->  Gather  (cost=5977.35..5977.56 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=4977.35..4977.36 rows=1 width=8)
               ->  Parallel Index Only Scan using fs_storage_size on oc_filecache  (cost=0.57..4818.76 rows=63434 width=0)
                     Index Cond: (storage = 117)
(6 rows)

dbnccl03=> ANALYZE oc_filecache (storage);
ANALYZE
dbnccl03=> EXPLAIN SELECT COUNT(*) FROM "oc_filecache" WHERE "storage" = 117;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=7999.34..7999.35 rows=1 width=8)
   ->  Gather  (cost=7999.12..7999.33 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=6999.12..6999.13 rows=1 width=8)
               ->  Parallel Index Only Scan using fs_storage_size on oc_filecache  (cost=0.57..6806.17 rows=77183 width=0)
                     Index Cond: (storage = 117)
(6 rows)

dbnccl03=>

Is it interesting?

FYI a VACUUM ANALYSE is run every night in the server (via the command vacuumdb --all --analyze --verbose). I'm not sure but I think that an ANALYSE has no effect on PostgreSQL with SELECT COUNT(*) requests.

pielonet commented 1 year ago

@flaf The vaccuum command you run every night is OK to analyse all tables. Moreover as you see your select query is well optimized with an index on "storage" as you can see in the Explain plan:

-> Parallel Index Only Scan using fs_storage_size on oc_filecache (cost=0.57..4818.76 rows=63434 width=0) Index Cond: (storage = 117)

ANALYZE DO have effects on "select count(*)" queries as long as there is a join or where clause in the query !

nextcloud / server