Open lbh375441316 opened 5 years ago
和我的问题类似,去掉"ignore_pinyin_offset":false,试试看
我也遇到了这个问题ik+pinyin,使用ik-smart一切正常,但是使用ik_max_word会出错。无论ignore_pinyin_offset是true,还是false。版本是6.5.2
Elasticsearch exception [type=illegal_argument_exception, reason=startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=95,endOffset=98,lastStartOffset=110 for field
我也遇到了这个问题ik+pinyin,使用ik-smart一切正常,但是使用ik_max_word会出错。无论ignore_pinyin_offset是true,还是false。版本是6.5.2
Elasticsearch exception [type=illegal_argument_exception, reason=startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=95,endOffset=98,lastStartOffset=110 for field
同样的问题,用ik_max_word就会报错,我目前只能改成ik_smart了
我也遇到了这个问题ik+pinyin,使用ik-smart一切正常,但是使用ik_max_word会出错。无论ignore_pinyin_offset是true,还是false。版本是6.5.2
Elasticsearch exception [type=illegal_argument_exception, reason=startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=95,endOffset=98,lastStartOffset=110 for field
同样的问题,用ik_max_word就会报错,我目前只能改成ik_smart了
去掉"word_delimiter"这个filter就可以了,是"word_delimiter"引起的错误,ik_max_word和这个好像不兼容。
我也遇到了这个问题ik+pinyin,使用ik-smart一切正常,但是使用ik_max_word会出错。无论ignore_pinyin_offset是true,还是false。版本是6.5.2
Elasticsearch exception [type=illegal_argument_exception, reason=startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=95,endOffset=98,lastStartOffset=110 for field
同样的问题,用ik_max_word就会报错,我目前只能改成ik_smart了
去掉"word_delimiter"这个filter就可以了,是"word_delimiter"引起的错误,ik_max_word和这个好像不兼容。
nice,同样的问题,7.9.1,去掉word_delimiter确实可以
最近做拼音分词,发现hanlp+pinyin分词是批量向ES中索引数据出错,刚开始以为是hanlp与pinyin不能集成,后面又试了一下IK+pinyin。结果使用ik_max_word也会同样出现这个问题,然后使用ik_smart却不会出这个问题,请问是什么原因,请大神帮忙看看。 mapping配置: { "order":0, "index_patterns":[ "zhiwen*" ], "settings":{ "number_of_shards":"3", "number_of_replicas":"2", "index":{ "analysis":{ "analyzer":{ "ik_pinyin_index":{ "type":"custom", "tokenizer":"hanlp-index", "filter":[ "index_pinyin", "word_delimiter" ] }, "ik_pinyin_search":{ "type":"custom", "tokenizer":"hanlp-index", "filter":[ "search_pinyin", "word_delimiter" ] } }, "filter":{ "index_pinyin":{ "type":"pinyin", "keep_first_letter":true, "keep_separate_first_letter":false, "keep_full_pinyin":true, "keep_joined_full_pinyin":true, "keep_original":true, "limit_first_letter_length":16, "lowercase":true, "keep_none_chinese":true, "keep_none_chinese_together":true, "keep_none_chinese_in_joined_full_pinyin":true, "keep_none_chinese_in_first_letter":true, "none_chinese_pinyin_tokenize":false, "remove_duplicated_term":false, "ignore_pinyin_offset":false }, "search_pinyin":{ "type":"pinyin", "keep_first_letter":true, "keep_separate_first_letter":false, "keep_full_pinyin":true, "keep_joined_full_pinyin":true, "keep_original":true, "limit_first_letter_length":16, "lowercase":true, "keep_none_chinese":true, "keep_none_chinese_together":true, "keep_none_chinese_in_joined_full_pinyin":true, "keep_none_chinese_in_first_letter":true, "none_chinese_pinyin_tokenize":true, "remove_duplicated_term":false, "ignore_pinyin_offset":false } } } } }, "mappings":{ "doc":{ "properties":{ "rowId":{ "type":"keyword" }, "title":{ "type":"text", "analyzer":"hanlp-index", "search_analyzer":"hanlp-index", "fields":{ "pinyin":{ "type":"text", "analyzer":"ik_pinyin_index", "store":false, "search_analyzer":"ik_pinyin_search" } } }, "text":{ "type":"text", "analyzer":"hanlp-index", "search_analyzer":"hanlp-index" "fields":{ "pinyin":{ "type":"text", "analyzer":"ik_pinyin_index", "store":false, "search_analyzer":"ik_pinyin_search" } } }
报错信息: Bulk indexing has failures. Use ElasticsearchException.getFailedDocuments() for detailed messages [{001eb378b4761c0fa5174b7c9fd6f81e=java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=59,endOffset=61,lastStartOffset=60 for field 'text.pinyin', 003e6d757f85e307571eef82708af198=java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=704,endOffset=706,lastStartOffset=705 for field 'text.pinyin', 001ce50569d53e1065e3ca3bdcd93ae3=java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=681,endOffset=683,lastStartOffset=682 for field 'text.pinyin', 006b1931dc2ccce724b2449043905551=java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=3932,endOffset=3934,lastStartOffset=3933 for field 'text.pinyin', 0000dd29ee5af4b0ca3926304ec78a6b=java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=29,endOffset=31,lastStartOffset=30 for field 'text.pinyin', 0023c017c5e937be91800c4029c05602=java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=153,endOffset=155,lastStartOffset=154 for field 'text.pinyin'}]