jprante / elasticsearch-plugin-bundle

A bundle of useful Elasticsearch plugins
GNU Affero General Public License v3.0
110 stars 17 forks source link

langdetect error (500): duplicate of the same language profile, using REST endpoint #17

Open marbleman opened 8 years ago

marbleman commented 8 years ago

I have noticed a strange error caused by langdetect, I haven't seen on my old 1.7 setup before: I am using PHP Elasticsearch\Client which uses Guzzle for the HTTP connection (which may or may not be part of the problem):

Everything is fine, if I just have one active thread on the PHP server talking to the ES cluster. When I open a second thread, I randomly see Exceptions is ES like

[2016-03-25 01:21:23,599][ERROR][org.xbib.elasticsearch.module.langdetect.LangdetectService] duplicate of the same language profile: en java.io.IOException: duplicate of the same language profile: en at org.xbib.elasticsearch.module.langdetect.LangdetectService.addProfile(LangdetectService.java:205) at org.xbib.elasticsearch.module.langdetect.LangdetectService.loadProfileFromResource(LangdetectService.java:199) at org.xbib.elasticsearch.module.langdetect.LangdetectService.load(LangdetectService.java:148) at org.xbib.elasticsearch.module.langdetect.LangdetectService.setProfile(LangdetectService.java:223) at org.xbib.elasticsearch.action.langdetect.TransportLangdetectAction.doExecute(TransportLangdetectAction.java:32) at org.xbib.elasticsearch.action.langdetect.TransportLangdetectAction.doExecute(TransportLangdetectAction.java:16) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70) at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351) at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52) at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.doExecute(BaseRestHandler.java:83) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351) at org.xbib.elasticsearch.rest.action.langdetect.RestLangdetectAction.handleRequest(RestLangdetectAction.java:30) at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:54) at org.elasticsearch.rest.RestController.executeHandler(RestController.java:207)

The language is different in each log entry and each logentry seems to relale to a different request. I am using the REST endpoint and I have limited the languages in elasticsearch.yml to about 10 languages. Before I drill deeper experimenting with combinations of settings and all that time consuming stuff I hope you can give me a hint about the best startpoint of investigation....

Thx in advance!

jprante commented 8 years ago

Looks like a race condition. LangdetectService is not thread safe. I think it will help to synchronize the call to LangdetectService in TransportLangdetectAction.

marbleman commented 8 years ago

Thanks for the hint!! However, that kind of change is out of the range of my current possibilities, I am afraid. AFAIK ES PHP module uses a round robin of all cluster nodes. Probably the race condition comes up when two requests hit the same node at the same time. This would explain the strange random factor.

I'll give it a try to direct each thread to a dedicated cluster node.

jprante commented 8 years ago

Yes, two threads execute on same node is the race condition. I will push a fix today, it is just wrapping the execution of detectAll in a synchronized statement.

jprante commented 8 years ago

The version with the fix is Bundle 2.2.0.5

http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-plugin-bundle/2.2.0.5/

marbleman commented 8 years ago

Amazing!! Unfortunately I cannot install it:

ERROR: java.lang.IllegalStateException: jar hell! class: org.apache.lucene.analysis.ar.ArabicAnalyzer$DefaultSetHolder jar1: /usr/share/elasticsearch/lib/lucene-analyzers-common-5.4.1.jar jar2: /tmp/1504669576103186/temp_name-206789507/lucene-analyzers-common-5.4.1.jar

jprante commented 8 years ago

Thanks.

My build procedure is broken, as a quick fix, just remove lucene-core-5.4.1.jar and lucene-analyzers-common-5.4.1.jar from plugins/bundle directory...

marbleman commented 8 years ago

Thaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaank you so much! Rus like hell but without jar hell now... and multihreaded withou any errors!