umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.41k stars 2.66k forks source link

Content delivery api FieldType.StringSortable treats values with spaces as separate indexes #15920

Closed McGern closed 4 months ago

McGern commented 5 months ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

13.2.2

Bug summary

In the content delivery api, when indexing fields using the FieldType.StringSortable value for the IndexField, values that have spaces in them seem to be treated like separate indexes

Specifics

When using StringRaw it needs exact values including casing, when using StringSortable casing is ignored (desired) but spaced words are indexed separately. Note that FilterOperation.Is has been applied to the BuildFilterOptions.

I'm not sure if this is a bug or the expected behaviour, but I couldn't find any information in the docs about what the FieldType values are expected to do.

Steps to reproduce

Create a fresh install of Umbraco, and used the sample filter https://docs.umbraco.com/umbraco-cms/reference/content-delivery-api/extension-api-for-querying#custom-filter, stripping out the author guid lookup and replacing with simple string.

Setup content delivery api as documented

using Umbraco.Cms.Core.DeliveryApi;
using Umbraco.Cms.Core.Models;

namespace Umbraco.Docs.Samples;

public class AuthorFilter : IFilterHandler, IContentIndexHandler
{
    private const string AuthorSpecifier = "author:";
    private const string FieldName = "author";

    // Querying
    public bool CanHandle(string query)
        => query.StartsWith(AuthorSpecifier, StringComparison.OrdinalIgnoreCase);

    public FilterOption BuildFilterOption(string filter)
    {
        var fieldValue = filter.Substring(AuthorSpecifier.Length);

        // There might be several values for the filter
        var values = fieldValue.Split(',');

        return new FilterOption
        {
            FieldName = FieldName,
            Values = values,
            Operator = FilterOperation.Is
        };
    }

    // Indexing
    public IEnumerable<IndexFieldValue> GetFieldValues(IContent content, string? culture)
    {
        string? author = content.GetValue<string>("author");

        if (string.IsNullOrWhiteSpace(author))
        {
            return Array.Empty<IndexFieldValue>();
        }

        return new[]
        {
            new IndexFieldValue
            {
                FieldName = FieldName,
                Values = new object[] { author }
            }
        };
    }

    public IEnumerable<IndexField> GetFields() => new[]
    {
        new IndexField
        {
            FieldName = FieldName,
            FieldType = FieldType.StringSortable,
            VariesByCulture = false
        }
    };
}

Note the FieldType.StringSortable in the GetFields method and FilterOperation.Is in the BuildFilterOption

Created an document type "Article" with only "author" as a textstring prop

Add a few article with varying author names Article 1 - Gary Smith Article 2 - John Smith Article 3 - Joan Thomson

Query with e.g. https://localhost:44338/umbraco/delivery/api/v2/content?filter=author:Smith

Expected result / actual result

Querying with /umbraco/delivery/api/v2/content?filter=author:Smith

Resulted in 2 items (A1 - Gary Smith and A2 - John Smith), would have expected 0

/umbraco/delivery/api/v2/content?filter=author:Gary

Resulted in 1 item (A1 - Gary Smith), would have expected 0

/umbraco/delivery/api/v2/content?filter=author:Gary%20smith

Resulted in 2 items (A1 - Gary Smith and A2 - John Smith, would have expected 1

github-actions[bot] commented 5 months ago

Hi there @McGern!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face:

Migaroez commented 4 months ago

Hey @McGern as far as I understand the codebase, we pass on the string of the filter into examine, since lucene under examine treats spaces as term separators by default, this seems to be expected behavior.

A quick workaround I learned from having to build search on examine/lucene is as follows. Manipulate the term generation to also include an "exact" term by concatenating all terms in a property. Save this next to (in a separate field or same field). This then allows you to search on partial terms and full terms. I updated the code example with this.

using Umbraco.Cms.Core.DeliveryApi;
using Umbraco.Cms.Core.Models;

namespace umb15920;

public class AuthorFilter : IFilterHandler, IContentIndexHandler
{
    private const string AuthorSpecifier = "author:";
    private const string FieldName = "author";

    // Querying
    public bool CanHandle(string query)
        => query.StartsWith(AuthorSpecifier, StringComparison.OrdinalIgnoreCase);

    public FilterOption BuildFilterOption(string filter)
    {
        var fieldValue = filter.Substring(AuthorSpecifier.Length);

        // There might be several values for the filter
        var values = fieldValue.Split(',').ToArray();

        return new FilterOption
        {
            FieldName = FieldName,
            Values = values,
            Operator = FilterOperation.Is
        };
    }

    // Indexing
    public IEnumerable<IndexFieldValue> GetFieldValues(IContent content, string? culture)
    {
        string? author = content.GetValue<string>("author");

        if (string.IsNullOrWhiteSpace(author))
        {
            return Array.Empty<IndexFieldValue>();
        }

        return new[]
        {
            new IndexFieldValue
            {
                FieldName = FieldName,
                Values = new object[] { author, author.EnsureSingleTerm() }
            }
        };
    }

    public IEnumerable<IndexField> GetFields() => new[]
    {
        new IndexField
        {
            FieldName = FieldName,
            FieldType = FieldType.StringSortable,
            VariesByCulture = false
        }
    };
}

public static class ExamineStringExtensions
{
    public static string EnsureSingleTerm(this string input)
        => input.Replace(' ', '_');
}
McGern commented 4 months ago

@Migaroez Thanks for your detailed feedback, it's very much appreciated.