jmix-framework / jmix

Jmix framework
https://www.jmix.io
Apache License 2.0
693 stars 124 forks source link

File parsing in search add-on doesn't work #3659

Closed fractal3000 closed 2 months ago

fractal3000 commented 2 months ago

Environment

Jmix version: 2.3.3

Bug Description

Search add-on can't parse any field with content of the "File" type.

Steps To Reproduce

  1. Create project with search add-on.
  2. Create entity with field of "FileReference" type.
  3. Create search index definition with pointing of the attribute.
  4. Start project
  5. Login to UI
  6. Add enitities data with adding files of some text format(doc, docx, txt)
  7. See the console log

Current Behavior

There are o lot of error messages in the console log.

Expected Behavior

There are no error messages in the console log.

The stack trace is: java.lang.NoClassDefFoundError: org/apache/poi/ooxml/extractor/ExtractorFactory at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:84) ~[tika-parsers-1.27.jar:1.27] at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113) ~[tika-parsers-1.27.jar:1.27] at io.jmix.search.utils.FileProcessor.extractFileContent(FileProcessor.java:63) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.FilePropertyValueExtractor.addFileContent(FilePropertyValueExtractor.java:91) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.FilePropertyValueExtractor.processFileRef(FilePropertyValueExtractor.java:80) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.FilePropertyValueExtractor.transformSingleValue(FilePropertyValueExtractor.java:63) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.AbstractPropertyValueExtractor.processValue(AbstractPropertyValueExtractor.java:89) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.propertyvalue.impl.AbstractPropertyValueExtractor.getValue(AbstractPropertyValueExtractor.java:46) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.mapping.MappingFieldDescriptor.getValue(MappingFieldDescriptor.java:146) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.impl.BaseEntityIndexer.addFieldValueToEntityIndexContent(BaseEntityIndexer.java:302) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.impl.BaseEntityIndexer.lambda$generateIndexDocument$11(BaseEntityIndexer.java:289) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[na:na] at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[na:na] at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1779) ~[na:na] at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[na:na] at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na] at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[na:na] at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[na:na] at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na] at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[na:na] at io.jmix.search.index.impl.BaseEntityIndexer.generateIndexDocument(BaseEntityIndexer.java:289) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.impl.BaseEntityIndexer.indexGroupedInstances(BaseEntityIndexer.java:142) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.impl.BaseEntityIndexer.indexCollectionByEntityIds(BaseEntityIndexer.java:97) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processQueueItemsGroup(JpaIndexingQueueManager.java:513) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processQueueItems(JpaIndexingQueueManager.java:498) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processQueue(JpaIndexingQueueManager.java:458) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processNextBatch(JpaIndexingQueueManager.java:301) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.search.index.queue.impl.JpaIndexingQueueManager.processNextBatch(JpaIndexingQueueManager.java:296) ~[jmix-search-2.4.999-SNAPSHOT.jar:na] at io.jmix.autoconfigure.search.job.IndexingQueueProcessingJob.execute(IndexingQueueProcessingJob.java:32) ~[jmix-search-starter-2.4.999-SNAPSHOT.jar:na] at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ~[quartz-2.3.2.jar:na] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) ~[quartz-2.3.2.jar:na] Caused by: java.lang.ClassNotFoundException: org.apache.poi.ooxml.extractor.ExtractorFactory at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) ~[na:na] at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) ~[na:na] at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ~[na:na] ... 31 common frames omitted

fractal3000 commented 2 months ago

The following vesion of the libruary have to be updated 'org.apache.tika:tika-parsers:1.28.5' to fix the issue.

SergeiAksenov2 commented 2 months ago

Tested on: 1) Jmix version: 2.3.999-SNAPSHOT Jmix Studio plugin version: 2.3.SNAPSHOT6778-233 IntelliJ version: IntelliJ IDEA 2023.3.7 (Community Edition) 2) Jmix version: 2.3.999-SNAPSHOT Jmix Studio plugin version: 2.3.SNAPSHOT6771-241 IntelliJ version: IntelliJ IDEA 2024.1.6 (Community Edition)

There are no error messages in the console log - Ok.

image