umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.49k stars 2.69k forks source link

"Cannot parse '__IndexType:content': Encountered error when getting examine search results #12646

Open ewuski opened 2 years ago

ewuski commented 2 years ago

Which exact Umbraco version are you using? For example: 9.0.1 - don't just write v9

Umbraco 9.4.3, 10.0.0

Bug summary

I am getting a strange error from a basic examine query, it looks like:

{"Cannot parse '__IndexType:content': Encountered \"<EOF>\" at line 1, column 19.\r\nWas expecting one of:\r\n    <BAREOPER> ...\r\n    \"(\" ...\r\n    \"*\" ...\r\n    <QUOTED> ...\r\n    <TERM> ...\r\n    <PREFIXTERM> ...\r\n    <WILDTERM> ...\r\n    <REGEXPTERM> ...\r\n    \"[\" ...\r\n    \"{\" ...\r\n    <NUMBER> ...\r\n    "}

or like:

{"Cannot parse '__IndexType:content': Encountered \"<EOF>\" at line 1,
column 0.\r\nWas expecting one of:\r\n    <NOT> ...\r\n    \"+\"
...\r\n    \"+\" ...\r\n    \"-\" ...\r\n    \"-\" ...\r\n   
<BAREOPER> ...\r\n    <BAREOPER> ...\r\n    \"(\" ...\r\n    \"(\"
...\r\n    \"*\" ...\r\n    <QUOTED> ...\r\n    \"*\" ...\r\n   
<TERM> ...\r\n    <PREFIXTERM> ...\r\n    <QUOTED> ...\r\n   
<WILDTERM> ...\r\n    <REGEXPTERM> ...\r\n    \"[\" ...\r\n   
<WILDTERM> ...\r\n    <REGEXPTERM> ...\r\n    \"[\" ...\r\n    \"{\"
...\r\n    <NUMBER> ...\r\n    <TERM> ...\r\n    \"*\" ...\r\n   
\"*\" ...\r\n    "}

The query is simple:

Category: "content", LuceneQuery: {+__NodeTypeAlias:product
+countryCode:ge -umbracoNaviHide:1}

The error happens on execution:

searchResults = searchQuery.Execute();

Anyone knows what's going on and what this error means?

I had the same data in Umbraco 7 site and all worked well. I had never such or similar error returned by the searched content.

It usually happens on the first result search. I I reload the page the results are fetched with no issues.

Specifics

Re-indexing does not make any difference.

It happens no matter the parameters of the search - it usually happens on the first results fetch, then - when the results are filtered or parameters of the search changed or the page simply reloaded and fresh search executed - it loads the results correctly as long as we stay on the same page.

If I move away and load the same results page in a new tab - again the error happens. So it seems to be happening on every new page instance.

Steps to reproduce

Add a search that uses the below query or similar - depending on your database and site doc types.

I am using built-in ExternalIndex for the query.

    ISearchResults searchResults = null;
    var index = GetExamineIndex(UmbracoIndexes.ExternalIndexName);
    var searcher = index.Searcher;
    var searchQuery = searcher.CreateQuery(searcherSettings.IndexingType)
             .Field("__NodeTypeAlias", searcherSettings.ContentTypeToSearch.GetDescription());
    searchQuery = searchQuery.Not().Field("umbracoNaviHide", "1");

    // The resulting query is:
    // Category: "content", LuceneQuery: {+__NodeTypeAlias:product -umbracoNaviHide:1}

    searchResults = searchQuery.Execute();

    private IIndex GetExamineIndex(string indexName)
    {
        {
            if (!_examineManager.TryGetIndex(indexName, out var index) || !(index is IUmbracoIndex umbIndex))
            {
                throw new InvalidOperationException($"No index found by name ExternalIndex or is not of type {typeof(IUmbracoIndex)}");
            }
            else
            {
                return index;
            }
        }
    }

Expected result / actual result

It should return results correctly for the first time.

nielslynggaard commented 2 years ago

I've run into this as well. Someone else did also here; https://our.umbraco.com/forum/using-umbraco-and-getting-started/108739-umbraco-9-search-eof

I've figured out that this;

var query = searcher.CreateQuery("content").NodeTypeAlias("lmeWarehouse");

Fails with "cannot parse" etc.. A bit at random, sometimes it works, sometimes it throws an error.

The workaround I found was to create the query like this;

searcher.CreateQuery().NativeQuery($"+__IndexType:{IndexTypes.Content}").And();

It seems something is broken when doing "searcher.CreateQuery("content")

bjarnef commented 1 year ago

I got a similar issue in Umbraco v10.2.1

image

image

It shows the raw lucene query +(createDate:[0 TO 3155378975999990000]) -hideFromSearch:1 +(__NodeTypeAlias:course) but this seems to work fine when searching in Examine dashboard.

emmagarland commented 1 year ago

I get the same issue (sporadically) in Umbraco 10.3 around 1/3 times that I execute a search with the CreateQuery syntax:

IBooleanOperation? booleanOperation = index.Searcher
                    .CreateQuery(IndexTypes.Content)
                    .GroupedOr(searchFields, searchTerm);

`Lucene.Net.QueryParsers.Classic.ParseException: 'Cannot parse '__IndexType:content': Encountered "" at line 1, column 19. Was expecting one of:

... "+" ... "-" ... ... "(" ... "*" ... ... ... ... ... ... "[" ... "{" ... ... ... "+" ... "-" ... ... "(" ... "*" ... ... ... ... ... ... "[" ... "{" ... ... ... "*" ... ... "*" ... ' ` Switching to the above suggestion of using `NativeQuery` works fine each time: ``` IBooleanOperation? booleanOperation = index.Searcher .CreateQuery() .NativeQuery($"+__IndexType:{IndexTypes.Content}") .And() .GroupedOr(searchFields, searchTerm); ```
MrJackWilson commented 1 year ago

I am also facing this issue in Umbraco 10.2.1.

I am also receiving this error alongside it:

System.NullReferenceException: Object reference not set to an instance of an object.
   at Lucene.Net.QueryParsers.Classic.QueryParser.Jj_add_error_token(Int32 kind, Int32 pos)
   at Lucene.Net.QueryParsers.Classic.QueryParser.Jj_scan_token(Int32 kind)
   at Lucene.Net.QueryParsers.Classic.QueryParser.Jj_3R_2()
   at Lucene.Net.QueryParsers.Classic.QueryParser.Jj_3_1()
   at Lucene.Net.QueryParsers.Classic.QueryParser.Jj_2_1(Int32 xla)
   at Lucene.Net.QueryParsers.Classic.QueryParser.Clause(String field)
   at Lucene.Net.QueryParsers.Classic.QueryParser.Query(String field)
   at Lucene.Net.QueryParsers.Classic.QueryParser.TopLevelQuery(String field)
   at Lucene.Net.QueryParsers.Classic.QueryParserBase.Parse(String query)
   at Examine.Lucene.Search.LuceneSearchQueryBase.GetFieldInternalQuery(String fieldName, IExamineValue fieldValue, Boolean useQueryParser)
   at Examine.Lucene.Search.LuceneSearchQuery.Search(QueryOptions options)

I'm using the same method mentioned above

searcher.CreateQuery(IndexTypes.Content)
BarryFogarty commented 1 year ago

I also found this error randomly occurring. Umbraco version 10.2.0.

SATC-Ben commented 1 year ago

Also getting this issue sporadically. Umbraco 11.1.0

Using the following: var results = query.Execute(new QueryOptions(skip, (int)searchParameters.PageSize)); and getting ...

Lucene.Net.QueryParsers.Classic.ParseException: Cannot parse '__IndexType:content': Encountered "<EOF>" at line 1, column 19.
Was expecting one of:
    "^" ...
    <TERM> ...
    <FUZZY_SLOP> ...

and ...


   at Lucene.Net.QueryParsers.Classic.QueryParser.Term(String field)
   at Lucene.Net.QueryParsers.Classic.QueryParser.Clause(String field)
   at Lucene.Net.QueryParsers.Classic.QueryParser.Query(String field)
   at Lucene.Net.QueryParsers.Classic.QueryParser.TopLevelQuery(String field)
   at Lucene.Net.QueryParsers.Classic.QueryParserBase.Parse(String query)
   --- End of inner exception stack trace ---
   at Lucene.Net.QueryParsers.Classic.QueryParserBase.Parse(String query)
   at Examine.Lucene.Search.LuceneSearchQueryBase.ParseRawQuery(String rawQuery)
   at Examine.Lucene.Search.LuceneSearchQueryBase.GetFieldInternalQuery(String fieldName, IExamineValue fieldValue, Boolean useQueryParser)
   at Examine.Lucene.Search.LuceneSearchQuery.Search(QueryOptions options)
   at Examine.Lucene.Search.LuceneSearchQuery.Execute(QueryOptions options)
   at Examine.Lucene.Search.LuceneBooleanOperation.Execute(QueryOptions options)```
prjseal commented 1 year ago

I get the same issue (sporadically) in Umbraco 10.3 around 1/3 times that I execute a search with the CreateQuery syntax:

IBooleanOperation? booleanOperation = index.Searcher
                    .CreateQuery(IndexTypes.Content)
                    .GroupedOr(searchFields, searchTerm);

Lucene.Net.QueryParsers.Classic.ParseException: 'Cannot parse '__IndexType:content': Encountered "<EOF>" at line 1, column 19. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... <REGEXPTERM> ... "[" ... "{" ... <NUMBER> ... <NOT> ... "+" ... "-" ... <BAREOPER> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... <REGEXPTERM> ... "[" ... "{" ... <NUMBER> ... <TERM> ... "*" ... <TERM> ... "*" ... '

Switching to the above suggestion of using NativeQuery works fine each time:

IBooleanOperation? booleanOperation = index.Searcher
                        .CreateQuery()
                        .NativeQuery($"+__IndexType:{IndexTypes.Content}")
                        .And()
                        .GroupedOr(searchFields, searchTerm);

Thanks for this @emmagarland, it fixed the issue I'm having in 10.4.0

I'm able to keep all of my code the same, but just needed to change the first part.

I went from this:

                var query = index
                    .Searcher
                    .CreateQuery("content")
                    .NodeTypeAlias(MyDocType.ModelTypeAlias);

                    ....grouped or and ands below here etc

to this:

                var query = index
                    .Searcher
                    .CreateQuery()
                    .NativeQuery($"+__IndexType:{IndexTypes.Content}")
                    .And()
                    .NodeTypeAlias(MyDocType.ModelTypeAlias);

                    ....grouped or and ands below here etc

Now I don't get the error at all.

mirkomaty commented 1 year ago

Are there any insights to this? I have a similar query which throws the error with no apparent cause. It works for hours and suddenly throws the exception:

Lucene.Net.QueryParsers.Classic.ParseException: Cannot parse '__IndexType:content': Encountered "" at line 1, column 0.

IIndex index;
if (!this.examineManager.TryGetIndex( "InternalIndex", out index ))
{
    return null;
}

var fields = new string[]{ propertyAlias };
var booleanOperation = index.Searcher
        .CreateQuery( searchType, Examine.Search.BooleanOperation.And )
        .ManagedQuery( searchString, fields );  // Must be a manged Query because searchString can be an Udi                    

if (contentTypeAlias != null)
    booleanOperation.And().Field( "__NodeTypeAlias", contentTypeAlias );

if (publishedOnly)
    booleanOperation.And().Field( "__Published", "y" );

return booleanOperation.Execute();

searchType = "content"

Shazwazza commented 11 months ago

Hi all, see this thread https://github.com/Shazwazza/Examine/issues/335#issuecomment-1834592280

I have pushed an Examine 3.2.0 which hopefully resolves the problem, though i still can't figure out how that part of the codebase gets called by Umbraco.

biapar commented 6 months ago

The problem is also into Umbraco version 13.1.1

Lucene.Net.QueryParsers.Classic.ParseException: Cannot parse '(id: OR __Path:-1,,)': Encountered " "OR "" at line 1, column 5. Was expecting one of:

... "(" ... "*" ... ... ... ... ... ... "[" ... "{" ... ... ---> Lucene.Net.QueryParsers.Classic.ParseException: Encountered " "OR "" at line 1, column 5. Was expecting one of: ... "(" ... "*" ... ... ... ... ... ... "[" ... "{" ... ... at Lucene.Net.QueryParsers.Classic.QueryParser.Jj_consume_token(Int32 kind) at Lucene.Net.QueryParsers.Classic.QueryParser.Clause(String field) at Lucene.Net.QueryParsers.Classic.QueryParser.Query(String field) at Lucene.Net.QueryParsers.Classic.QueryParser.Clause(String field) at Lucene.Net.QueryParsers.Classic.QueryParser.Query(String field) at Lucene.Net.QueryParsers.Classic.QueryParserBase.Parse(String query) --- End of inner exception stack trace --- at Lucene.Net.QueryParsers.Classic.QueryParserBase.Parse(String query) at Examine.Lucene.Search.LuceneSearchQueryBase.NativeQuery(String query) at Umbraco.Cms.Infrastructure.Examine.DeliveryApiContentIndex.PerformDeleteFromIndex(IEnumerable`1 itemIds, Action`1 onComplete) at Umbraco.Cms.Infrastructure.Examine.Deferred.DeliveryApiContentIndexHandleContentChanges.Reindex(IContent content, IIndex index) at Umbraco.Cms.Infrastructure.Examine.Deferred.DeliveryApiContentIndexHandleContentChanges.b__7_0(CancellationToken _) at Umbraco.Cms.Infrastructure.HostedServices.QueuedHostedService.BackgroundProcessing(CancellationToken stoppingToken)
Shazwazza commented 6 months ago

Is that specifically using Examine 3.2.0 ?

biapar commented 6 months ago
image
Shazwazza commented 6 months ago

The problem is that the input query is invalid: (id: OR __Path:-1*,,) which is generated by Umbraco's code, not Examine.

biapar commented 5 months ago

Have I to update the Umbraco?

mirkomaty commented 2 months ago

Could someone from HQ comment on this? Sorry to say that, but it's quite annoying. This is the code failing:

booleanOperation = index.Searcher
        .CreateQuery("content")
        .Field("__NodeTypeAlias", "newsSource");

var result = booleanOperation.Execute();

With the error message:

Lucene.Net.QueryParsers.Classic.ParseException: Cannot parse '__IndexType:content': Encountered "<EOF>" at line 1, column 19.
Was expecting one of:
    "*" ...
    <TERM> ...
nielslynggaard commented 2 months ago

@mirkomaty if you do this;

index.Searcher.CreateQuery().NativeQuery($"+__IndexType:{IndexTypes.Content}")

instead of

index.Searcher.CreateQuery("content").And()

Then you should be home free... I've created the habbit of always doing that to avoid this annoying bug...

mirkomaty commented 2 months ago

@nielslynggaard Thanks for reminding me of this workaround. I'll give it a try.

Shazwazza commented 1 month ago

Make sure you are running the latest examine/umbraco version. There are 2 causes: a) previous examine versions tried to use a shared query parser instance for certain fields but this is problematic because query parser instances are not thread safe. b) if a query is created with missing fields, the output generated will be invalid. This occurred in a couple places within Umbraco itself.

biapar commented 1 month ago

I installed the latest v13 version and solved.