tumblr / collins

groovy kind of love
tumblr.github.com/collins
Apache License 2.0
571 stars 99 forks source link

solr fails and stops indexing when attribute too long #474

Open discordianfish opened 8 years ago

discordianfish commented 8 years ago

Hi,

I've (accidentally) added a very large attribute (containing the lshw output) to some of my systems. Now when collins tries to update solr it runs into this error:

2016-09-27 16:21:52.406254500  [error] o.a.s.c.SolrCore - org.apache.solr.common.SolrException: Exception writing document id ASSET_2 to the index; possible analysis error.
2016-09-27 16:21:52.406279500   at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167)
2016-09-27 16:21:52.406281500   at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
2016-09-27 16:21:52.406282500   at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
2016-09-27 16:21:52.406284500   at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
2016-09-27 16:21:52.406285500   at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
2016-09-27 16:21:52.406286500   at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706)
2016-09-27 16:21:52.406287500   at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
2016-09-27 16:21:52.406289500   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
2016-09-27 16:21:52.406290500   at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
2016-09-27 16:21:52.406291500   at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
2016-09-27 16:21:52.406293500   at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
2016-09-27 16:21:52.406294500   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
2016-09-27 16:21:52.406295500   at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:179)
2016-09-27 16:21:52.406296500   at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
2016-09-27 16:21:52.406298500   at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
2016-09-27 16:21:52.406299500   at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
2016-09-27 16:21:52.406300500   at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
2016-09-27 16:21:52.406301500   at collins.solr.SolrHelper$$anonfun$updateItems$1.apply(SolrHelper.scala:120)
2016-09-27 16:21:52.406302500   at collins.solr.SolrHelper$$anonfun$updateItems$1.apply(SolrHelper.scala:115)
2016-09-27 16:21:52.406304500   at scala.Option.map(Option.scala:146)
2016-09-27 16:21:52.406305500   at collins.solr.SolrHelper$.updateItems(SolrHelper.scala:115)
2016-09-27 16:21:52.406306500   at collins.solr.SolrHelper$.updateAssets(SolrHelper.scala:137)
2016-09-27 16:21:52.406307500   at collins.solr.SolrHelper$$anonfun$populate$1$$anonfun$apply$2.apply(SolrHelper.scala:98)
2016-09-27 16:21:52.406308500   at collins.solr.SolrHelper$$anonfun$populate$1$$anonfun$apply$2.apply(SolrHelper.scala:91)
2016-09-27 16:21:52.406310500   at scala.Option.map(Option.scala:146)
2016-09-27 16:21:52.406311500   at collins.solr.SolrHelper$$anonfun$populate$1.apply(SolrHelper.scala:91)
2016-09-27 16:21:52.406312500   at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
2016-09-27 16:21:52.406313500   at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
2016-09-27 16:21:52.406314500   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
2016-09-27 16:21:52.406316500   at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
2016-09-27 16:21:52.406317500   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
2016-09-27 16:21:52.406318500   at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
2016-09-27 16:21:52.406319500   at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
2016-09-27 16:21:52.406320500   at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2016-09-27 16:21:52.406321500  Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="LSHW_meta_s" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[1, 10, 62, 116, 115, 105, 108, 47, 60, 10, 62, 101, 100, 111, 110, 47, 60, 10, 62, 101, 100, 111, 110, 47, 60, 32, 32, 10, 62, 115]...', original message: bytes can be at most 32766 in length; got 128174
2016-09-27 16:21:52.406323500   at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:671)
2016-09-27 16:21:52.406324500   at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
2016-09-27 16:21:52.406325500   at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
2016-09-27 16:21:52.406327500   at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
2016-09-27 16:21:52.406328500   at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
2016-09-27 16:21:52.406329500   at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1475)
2016-09-27 16:21:52.406330500   at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
2016-09-27 16:21:52.406331500   at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
2016-09-27 16:21:52.406332500   ... 33 more
2016-09-27 16:21:52.406333500  Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 128174
2016-09-27 16:21:52.406335500   at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
2016-09-27 16:21:52.406336500   at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:150)
2016-09-27 16:21:52.406337500   at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:661)
2016-09-27 16:21:52.406338500   ... 40 more

I'm running a pretty recent built from master (built August 10).

Since I didn't run into this earlier, I'm not completely sure if the attribute is simply too long or whether there is some regression. If it's simply too long, the API should prevent me from adding those.