jmix-framework / jmix

Jmix framework
https://www.jmix.io
Apache License 2.0
561 stars 121 forks source link

File parsing in search add-on doesn't work #3659

Closed fractal3000 closed 4 weeks ago

fractal3000 commented 1 month ago

Environment

Jmix version: 2.3.3

Bug Description

Search add-on can't parse any field with content of the "File" type.

Steps To Reproduce

  1. Create project with search add-on.
  2. Create entity with field of "FileReference" type.
  3. Create search index definition with pointing of the attribute.
  4. Start project
  5. Login to UI
  6. Add enitities data with adding files of some text format(doc, docx, txt)
  7. See the console log

Current Behavior

There are o lot of error messages in the console log.

Expected Behavior

There are no error messages in the console log.

The stack trace is: java.lang.NoClassDefFoundError: org/apache/poi/ooxml/extractor/ExtractorFactory at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:84) ~[tika-parsers-1.27.jar:1.27] at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113) ~[tika-parsers-1.27.jar:1.27] at io.jmix.search.utils.FileProcessor.extractFileContent(FileProcessor.java:63) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.FilePropertyValueExtractor.addFileContent(FilePropertyValueExtractor.java:91) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.FilePropertyValueExtractor.processFileRef(FilePropertyValueExtractor.java:80) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.FilePropertyValueExtractor.transformSingleValue(FilePropertyValueExtractor.java:63) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.AbstractPropertyValueExtractor.processValue(AbstractPropertyValueExtractor.java:89) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.AbstractPropertyValueExtractor.getValue(AbstractPropertyValueExtractor.java:46) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.MappingFieldDescriptor.getValue(MappingFieldDescriptor.java:146) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.impl.BaseEntityIndexer.addFieldValueToEntityIndexContent(BaseEntityIndexer.java:302) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.impl.BaseEntityIndexer.lambda$generateIndexDocument$11(BaseEntityIndexer.java:289) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[na:na] at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[na:na] at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1779) ~[na:na] at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[na:na] at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na] at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[na:na] at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[na:na] at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na] at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[na:na] at io.jmix.search.index.impl.BaseEntityIndexer.generateIndexDocument(BaseEntityIndexer.java:289) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.impl.BaseEntityIndexer.indexGroupedInstances(BaseEntityIndexer.java:142) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.impl.BaseEntityIndexer.indexCollectionByEntityIds(BaseEntityIndexer.java:97) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processQueueItemsGroup(JpaIndexingQueueManager.java:513) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processQueueItems(JpaIndexingQueueManager.java:498) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processQueue(JpaIndexingQueueManager.java:458) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processNextBatch(JpaIndexingQueueManager.java:301) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processNextBatch(JpaIndexingQueueManager.java:296) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.autoconfigure.search.job.IndexingQueueProcessingJob.execute(IndexingQueueProcessingJob.java:32) ~[jmix-search-starter-2.4.999-SNAPSHOT.jar:na] at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ~[quartz-2.3.2.jar:na] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) ~[quartz-2.3.2.jar:na] Caused by: java.lang.ClassNotFoundException: org.apache.poi.ooxml.extractor.ExtractorFactory at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) ~[na:na] at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) ~[na:na] at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ~[na:na] ... 31 common frames omitted

fractal3000 commented 1 month ago

The following vesion of the libruary have to be updated 'org.apache.tika:tika-parsers:1.28.5' to fix the issue.

SergeiAksenov2 commented 4 weeks ago

Tested on: 1) Jmix version: 2.3.999-SNAPSHOT Jmix Studio plugin version: 2.3.SNAPSHOT6778-233 IntelliJ version: IntelliJ IDEA 2023.3.7 (Community Edition) 2) Jmix version: 2.3.999-SNAPSHOT Jmix Studio plugin version: 2.3.SNAPSHOT6771-241 IntelliJ version: IntelliJ IDEA 2024.1.6 (Community Edition)

There are no error messages in the console log - Ok.

image