nextcloud / fulltextsearch

🔍 Core of the full-text search framework for Nextcloud
GNU Affero General Public License v3.0
210 stars 51 forks source link

Fulltextsearch Fails with Usernames and Group Names Containing Special Characters #768

Open wickeddoc87 opened 1 year ago

wickeddoc87 commented 1 year ago

Issue Title: Fulltextsearch Fails with Usernames and Group Names Containing Special Characters

Description:

When using the Fulltextsearch feature in Nextcloud, I've consistently observed that the search doesn't operate correctly for usernames and group names containing special characters, spaces, or specific combinations of letters and numbers. Specifically:

Usernames and groups like "test", "test3", "testgroup" and , "test3group" operate as expected.
Usernames and group names such as "test 3", "Test_3", "Test@3", "test group" do not function properly.

This issue has been consistently reproducible over the past 2 weeks.

Steps to Reproduce:

Set up a new Nextcloud instance.
Install and configure the Fulltextsearch plugin.
Create users and groups with varying name patterns:
    test, test3, testgroup, test3group: functions as expected
    test 3,  Test_3, Test@3, "test group": search doesn't work

Execute a full-text search for each user and group.

Expected Result:

The Fulltextsearch should function consistently, irrespective of the username or group name's structure or content.

Actual Result:

Fulltextsearch operates only for usernames and group names that are simple strings made up of letters and numbers. It fails when the name includes special characters or spaces etc.

Environment:

Nextcloud version: 27.0.2.1
elasticsearch version: 8.8.1
Full text search version 27.0.1
Full text search - Elasticsearch Platform version: 27.0.2
Full text search - Files version: 27.0.1
Full text search - Files - Tesseract OCR version: 27.0.0
Database: Postgres 15

Additional Notes:

The user experience of Fulltextsearch could see significant improvement if this issue is addressed. Many users and groups tend to have names with spaces and special characters. As a result, there's a prevalent misconception that Fulltextsearch operates exclusively for admin users or admin groups, which is not the case.

vbier commented 1 year ago

I have tried to understand the source code, and as far as I could, the function generateSearchQueryAccess in https://github.com/nextcloud/fulltextsearch_elasticsearch/blob/master/lib/Service/SearchMappingService.php#L307 restricts the found documents by permissions. It either has to be owned by the current user, be shared with him, be public or shared with a group or circle he is member of.

When I look at my indexed document, I can see that the owner matches the current user id, but the document is not found regardless. If I change the query to be a match query instead of a term, all documents seem to be found as expected. Reading up on ElasticSearch documenation, I can not see why this should be the case for a keyword field. So this does not seem to be the proper fix.

But I do not have the knowledge to investigate this further. Maybe somebody else can pick up here. The problem IMHO does not exist in the fulltextsearch code, but rather in the fulltextsearch_elasticsearch code.

vbier commented 1 year ago

After checking the index mapping I realized that the users and owner fields are of type text, which completely explains why a term query can not find the userids with blanks are special characters:

        "users" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
        "owner" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }

The fields either need to be of type keyword, or the code in SearchManager.php needs to be changed to work on the subfields owner.keyword and users.keyword. Then the term query works as expected.

I have written an issue in the correct github project: https://github.com/nextcloud/fulltextsearch_elasticsearch/issues/300