headwirecom / aem-solr-search

AEM Solr Search
Apache License 2.0
51 stars 42 forks source link

Character set encoding issues with stop word file(s) under Windows 8 #2

Closed GastonGonzalez closed 10 years ago

GastonGonzalez commented 10 years ago

On Windows 8, collection1 fails to initialize from the quickstart module due to character encoding issues for one or more stop word files. The following exception is thrown.

{msg=SolrCore 'collection1' is not available due to init failure: Could not load core configuration for core collection1,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Could not load core configuration for core collection1

at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:753) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:307) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:485) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:290) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:606) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:535) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.solr.common.SolrException: Could not load core configuration for core collection1 at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:66) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:554) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more Caused by: java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input length = 1 at org.apache.solr.schema.IndexSchema.(IndexSchema.java:168) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:89) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62) ... 9 more Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:277) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:338) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.readLine(BufferedReader.java:317) at java.io.BufferedReader.readLine(BufferedReader.java:382) at org.apache.lucene.analysis.util.WordlistLoader.getLines(WordlistLoader.java:223) at org.apache.lucene.analysis.util.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:256) at org.apache.lucene.analysis.util.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:244) at org.apache.lucene.analysis.core.StopFilterFactory.inform(StopFilterFactory.java:99) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:675) at org.apache.solr.schema.IndexSchema.(IndexSchema.java:166) ... 13 more ,code=500}

GastonGonzalez commented 10 years ago

Until this matter is investigated further, the following work around is available.

Edit the schema (aemsolrsearch-quickstart/src/main/resources/aem-solr-home/collection1/conf/schema.xml) and comment out all non-English field types. For example, comment or delete text_* starting with Arabic through Turkish. Ensure that you keep text_en, however.

<!-- Arabic -->
<!-- 
<fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">
  <analyzer> 
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <!-- for any non-arabic -->
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ar.txt" />
    <!-- normalizes ﻯ to ﻱ, etc -->
    <filter class="solr.ArabicNormalizationFilterFactory"/>
    <filter class="solr.ArabicStemFilterFactory"/>
  </analyzer>
</fieldType>
-->