tenlee2012 / elasticsearch-analysis-hao

一个非常hao用的elasticsearch(es)中文分词器插件
Apache License 2.0
231 stars 28 forks source link

长文本分词卡住的问题 #58

Open luchihao opened 1 year ago

luchihao commented 1 year ago

插件版本v8.3.3


{
    "analyzer": "hao_index_mode",
    "text": "0xFF0x030x420x270x010x430x320x010x460x640x5E0x5C0x3B0x010x360x290x010x050x330x010x030x220x010x630x3E0x010x270x340x350x5F0x3D0x010x400x2F0x010x3C0x430x010x610x3E0x010x370x140x010x3F0x360x3C0x2F0x350x380x060x2D0x010x040x460x060x490x310x010x280x230x1F0x220x2C0x2E0x190x460x410x0E0x340x1F0x1A0x650x3A0x0F0x250x350x1D0x200x1C0x070x330x010x230x4F0x3E0x240x650x650x3A0x2D0x010x3D0x370x490x090x4B0x170x480x4D0x540x470x1F0x410x410x0E0x300x0B0x370x320x450x330x010x4B0x010x010x4A0x330x330x510x360x010x4C0x5F0x240x0D0x380x010x5D0x010x010x2B0x270x010x290x440x010x2D0x340x010x5E0x3A0x010x3E0x2E0x010x080x3C0x010x0A0x340x010x010x2B0x010x440x240x010x670x3D0x010x0C0x3A0x010x250x330x330x000x310x3A0x310x300x330x330x330x330x330x2C0x320x3A0x310x330x330x330x530x330x330x2C0x330x3A0x320x330x010x010x0B0x010x330x2C0x340x3A0x330x390x370x360x350x330x330x2C0x370x3A0x340x330x330x330x330x330x330x2C0x310x300x3A0x360x300x330x010x0B0x010x330x2C0x310x340x3A0x310x300x300x320x330x330x330x330x330x2C0x310x350x3A0x360x380x330x330x330x330x330x2C0x380x3A0x320x330x010x010x0B0x010x330x2C0x330x310x3A0x320x330x280x310x1C0x010x330x2C0x380x310x3A0x320x330x280x310x1C0x010x330x2C0x320x300x300x300x3A0x320x310x390x010x650x630x010x330x2C0x370x310x3A0x310x3F0x650x5B0x010x330x2C0x370x320x3A0x310x010x010x010x010x330x2C0x370x330x3A0x310x010x010x010x010x330x2C0x390x310x3A0x310x5B0x150x330x010x330x2C0x390x320x3A0x310x010x0B0x0B0x010x330x2C0x390x330x3A0x310x010x0B0x0B0x010x330x000x310x310x340x000x340x330x000x330x000x320x320x000x010x010x630x000x000x310x2E0x30"
}

分词会直接卡住。同版本的ik没有这个问题

wx5223 commented 4 months ago

这个问题还比较严重,经过测试最新版仍然存在。

wx5223 commented 4 months ago

已设置-Xms512m -Xmx512m , 执行上述analyzer,es服务退出,日志中提示:java.lang.OutOfMemoryError: Java heap space