ik fails to output the right offsets if the char fitlers apply to the input stream

Hi Team,
我发现ik 
tokenizer对html_filter处理过的字符串输出offsets有误。html_filter的
base class 
BaseCharFilter里包含了offsets和diffs两个数组，分别是stripped以后�
��tokens的offsets和相对于源string需要修正的delta。ik（我用的ik20
12 FF hotfix1，google 
code）的代码，没有对这个offsets和diffs处理。导致输出的offset�
��处理后的无html 
tag的string上的offset。我在我的github上做了修改，大致测了一��
�貌似可以了。主要修改在这个github的pull request上了。

https://github.com/xpandan/ik-analyzer/commit/7cc797ca78399cdae4f31181970e85db28
be4e5d

html_strip本身也不少bug，你也可以用mapping 
filter来测，原理一样的。有空帮我review下code吧。我是为了项�
��临时来研究lucene的，请多多指教。

Best,
Dan

Original issue reported on code.google.com by xpan...@gmail.com on 12 Sep 2014 at 10:34

marsares / ik-analyzer

ik fails to output the right offsets if the char fitlers apply to the input stream #136