infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.96k stars 548 forks source link

Is the filter "pinyin" supported in Custom normalizer filter? #239

Open jeasonchan opened 4 years ago

jeasonchan commented 4 years ago

hi medcl: Is the filter "pinyin" supported in Custom normalizer filter? Offical site says, "Custom normalizers take a list of character filters and a list of token filters." But when I try to customize a normalizer with the filter "pinyin" provided from elasticsearch-analysis-pinyin with the following json,error is returned.

  老哥,我看官网说正规器可以使用分析器里用到的那些filter,但是,我用下面的json串设置字段映射时,报错了,内容如下。是不是项目里提供的pinyin  filter不支持用在normalizer 里的filter啊?
# filters defiend in setings  ====================
"filter": {
                "english_stop": {
                    "type": "stop",
                    "stopwords": "_english_"
                },
                "pinyin_filter": {
                    "type": "pinyin",
                    "keep_separate_first_letter": false,
                    "keep_full_pinyin": true,
                    "keep_original": true,
                    "limit_first_letter_length": 16,
                    "lowercase": true,
                    "remove_duplicated_term": true
                }
            },
            "normalizer": {
                "pinyin_normalizer": {
                    "type": "custom",
                    "char_filter": [],
                    "filter": ["pinyin_filter"]
                }
            }

#    mapping   ==================================

    {
                "string_keyword_fields": {
                    "match": "*",
                    "unmatch": "*_html_",
                    "match_mapping_type": "string",
                    "mapping": {
                        "type": "keyword",
                        "normalizer": "pinyin_normalizer"
                    }
                }
            }

# response =====================
{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "Custom normalizer [pinyin_normalizer] may not use filter [pinyin_filter]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "Custom normalizer [pinyin_normalizer] may not use filter [pinyin_filter]"
    },
    "status": 400
}
jeasonchan commented 4 years ago

After reviewing the office site again, I got this: The current list of filters that can be used in a normalizer is following: arabic_normalization, asciifolding, bengali_normalization, cjk_width, decimal_digit, elision, german_normalization, hindi_normalization, indic_normalization, lowercase, persian_normalization, scandinavian_folding, serbian_normalization, sorani_normalization, uppercase.

So it menans filters that can be used in custom normalizer are just in the list above and that is why es returns "Custom normalizer [pinyin_normalizer] may not use filter [pinyin_filter]".

Thx