nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.53k stars 4.08k forks source link

Use MySQL/MariaDB fulltext indexing on oc_filecache to speed up search results #15300

Open aquarat opened 5 years ago

aquarat commented 5 years ago

Steps to reproduce

  1. Run Nextcloud on a modern multi-core 64 bit machine with more than a few 100 000 files.
  2. Try and search for a file using the normal search function (no apps/add-ons).

Expected behaviour

Nextcloud should return a set of results in a timely manner (less than 10 seconds ideally).

Actual behaviour

Nextcloud never returns the results. Underneath, the MySQL/MariaDB server utilizes 100% on one of the server's CPU cores for some time (more than a minute).

Server configuration

Ubuntu 18.04 running Nextcloud 16 Docker Hub image. MariaDB 10.3.12 (Docker Hub) running off a RAM drive.

Cause

In lib/private/Files/Cache/Cache.php, line 646:

// Original
$sql = ’
SELECT fileid, storage, path, parent, name,
mimetype, storage_mtime, mimepart, size, mtime,
encrypted, etag, permissions, checksum
FROM *PREFIX*filecache
WHERE storage = ? AND name ILIKE ?’;

This query doesn't use an index AFAIK, it scans the entire table, which is why it is slow.

Adding a fulltext index and modifying the query as below significantly reduces the query time and makes the search function usable :

// MariaDB/MySQL FULLTEXT Index
$sql = ’
SELECT fileid, storage, path, parent, name,
mimetype, storage_mtime, mimepart, size, mtime,
encrypted, etag, permissions, checksum
FROM *PREFIX*filecache
WHERE storage = ? AND MATCH(name) AGAINST (?)’;

I don't fully understand how Nextcloud's database schema migration works (the above requires a fulltext index). The fulltext index can be added like so : ALTER TABLE oc_filecache ADD FULLTEXT INDEX name_idx (name);

kesselb commented 5 years ago

cc @rullzer @MorrisJobke

solracsf commented 5 years ago

Have you tried Postgres? You should give it a try in this very specific case. See also http://rachbelaid.com/postgres-full-text-search-is-good-enough/

aquarat commented 5 years ago

Thanks for this. I just wanted to report back, in case someone else has this same issue.

I migrated the system's database from MariaDB(MySQL) to Postgres 11.3. Straight after the migration, with no indexes on the "oc_filecache"."name" column Postgres executed a basic file search query in 343 ms. After enabling the trigram extension and creating a trigram gin index on the "name" column in the oc_filecache table the same search query takes 12 ms to execute. This is with Nextcloud's stock "LIKE" query on a table containing ~1.5 million rows. So yes, Postgres is the way to go for larger datasets.

CREATE EXTENSION pg_trgm;
CREATE INDEX oc_filecache_trgm_idx ON oc_filecache USING GIN(name gin_trgm_ops);
szaimen commented 3 years ago

Is this Issue still valid in NC21.0.2? If not, please close this issue. Thanks! :)

ghost commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.