tomasloksa / azure-search-emulator

Containerized Azure Search Emulator for development purposes
MIT License
16 stars 3 forks source link

EdgeNGramTokenFilterV2 doesn't work - only full token search works #56

Open L-Sypniewski opened 2 years ago

L-Sypniewski commented 2 years ago

EdgeNGramTokenFilterV2 doesn't work, fields are searchable with full token search though, happy to help on fixing this. If indexed Description is "some description" results are obtained only for some description, some or description search terms.

My token.json:

{
    "@odata.context": ".....",
    "@odata.etag": "......",
    "name": "my-index",
    "defaultScoringProfile": null,
    "fields": [
        {
            "name": "Description",
            "type": "Edm.String",
            "searchable": true,
            "filterable": false,
            "retrievable": true,
            "sortable": false,
            "facetable": false,
            "key": false,
            "indexAnalyzer": "names",
            "searchAnalyzer": "textStandardSearchAnalyzer",
            "analyzer": null,
            "normalizer": null,
            "synonymMaps": []
        }
    ],
    "scoringProfiles": [],
    "corsOptions": null,
    "suggesters": [],
    "analyzers": [
        {
            "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
            "name": "textStandardSearchAnalyzer",
            "tokenizer": "whitespace",
            "tokenFilters": [
                "trim",
                "lowercase",
                "asciifolding"
            ],
            "charFilters": []
        },
        {
            "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
            "name": "names",
            "tokenizer": "whitespace",
            "tokenFilters": [
                "trim",
                "lowercase",
                "asciifolding",
                "nGramTokenFilter2To20Chars"
            ],
            "charFilters": []
        }
    ],
    "normalizers": [],
    "tokenizers": [],
    "tokenFilters": [
        {
            "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
            "name": "nGramTokenFilter2To20Chars",
            "minGram": 2,
            "maxGram": 20,
            "side": "front"
        }
    ],
    "charFilters": [],
    "encryptionKey": null,
    "similarity": {
        "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
        "k1": null,
        "b": null
    },
    "semantic": null
}
tomasloksa commented 2 years ago

Hey, It would be great if you could help. Do you have an idea, how could it be implemented?

I will be away for the rest of the week, but I could have a look at it later. There are some postman tests directly in the repo and much more, that are "indirectly" testing the emulator in my company's private repo - I could run them for you, if you open a PR.

L-Sypniewski commented 2 years ago

As for now I have no idea how it could be implemented, I have looked into the current code very briefly as I was figuring out how the SolrSearchQueryBuilderworks since I had some issues with it. I would be able to dedicate time for this issue only on every second Friday and Monday, but I'll try to go through the existing codebase this weekend to have at least a general idea how stuff works and how the issue might be approached.