keeps / dbptk-ui

DBPTK base UI for both Desktop and Enterprise
https://database-preservation.com
GNU Lesser General Public License v3.0
23 stars 9 forks source link

Spaces in type names lead to NumberFormatException (decimal to long) #326

Open marhop opened 2 years ago

marhop commented 2 years ago

Hi,

Description: I tried to load a SIARD 2.1 file (created with SiardFromDb 2.1.120 (SIARD Suite) from a MySQL 5.5.5-10.1.37-MariaDB-0+deb9u1 DBMS) with DBPTK Desktop 2.6.0 and got stuck when preparing for browse. In ~/.dbvtk/log/dbvtk.log the following error occured:

2022-08-02 08:33:20,898 [http-nio-auto-1-exec-4] WARN  c.d.c.s.i.DatabaseRowsSolrManager - Could not insert a document batch in collectiondbv-database-dc688e07-cfc9-4673-9bab-d9d12a198035. Last response (if any): null
org.apache.solr.common.SolrException: ERROR: [doc=570] Error adding field 'col18_l'='237869.25' msg=For input string: "237869.25"
    at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:224)
    at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:100)
    at org.apache.solr.update.AddUpdateCommand.lambda$null$0(AddUpdateCommand.java:261)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
    at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:295)
    at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207)
    at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162)
    at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301)
    at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:200)
    at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
    at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471)
    at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1464)
    at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:967)
    at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:342)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:294)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
    at org.apache.solr.update.processor.RunUpdateProcessorFactory$RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:73)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.NestedUpdateProcessorFactory$NestedUpdateProcessor.processAdd(NestedUpdateProcessorFactory.java:79)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:263)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:502)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:343)
    at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:343)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:229)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:481)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92)
    at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110)
    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:344)
    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:292)
    at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338)
    at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:245)
    at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303)
    at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
    at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196)
    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:131)
    at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122)
    at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70)
    at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
    at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
    at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227)
    at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
    at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:177)
    at com.databasepreservation.common.server.index.DatabaseRowsSolrManager.insertDocument(DatabaseRowsSolrManager.java:387)
    at com.databasepreservation.common.server.index.DatabaseRowsSolrManager.addRow(DatabaseRowsSolrManager.java:160)
    at com.databasepreservation.modules.viewer.DbvtkExportModule.handleDataRow(DbvtkExportModule.java:134)
    at com.databasepreservation.model.modules.filters.IdentityFilter.handleDataRow(IdentityFilter.java:88)
    at com.databasepreservation.model.modules.filters.ObservableFilter.handleDataRow(ObservableFilter.java:123)
    at com.databasepreservation.modules.siard.in.content.SIARD2ContentImportStrategy.endElement(SIARD2ContentImportStrategy.java:374)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at org.apache.xerces.impl.xs.XMLSchemaValidator.endElement(Unknown Source)
    at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at com.databasepreservation.modules.siard.in.content.SIARD2ContentImportStrategy.importContent(SIARD2ContentImportStrategy.java:184)
    at com.databasepreservation.modules.siard.in.input.SIARDImportDefault.migrateDatabaseTo(SIARDImportDefault.java:64)
    at com.databasepreservation.DatabaseMigration.migrate(DatabaseMigration.java:123)
    at com.databasepreservation.common.server.controller.SIARDController.convertSIARDtoSolr(SIARDController.java:706)
    at com.databasepreservation.common.server.controller.SIARDController.loadFromLocal(SIARDController.java:669)
    at com.databasepreservation.common.api.v1.CollectionResource.createCollection(CollectionResource.java:197)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
    at org.glassfish.jersey.servlet.ServletContainer.serviceImpl(ServletContainer.java:386)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:561)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:502)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:439)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
    at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
    at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:96)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:360)
    at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:399)
    at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
    at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:890)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1743)
    at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
    at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
    at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NumberFormatException: For input string: "237869.25"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Long.parseLong(Long.java:589)
    at java.lang.Long.parseLong(Long.java:631)
    at org.apache.solr.schema.LongPointField.createField(LongPointField.java:154)
    at org.apache.solr.schema.PointField.createFields(PointField.java:251)
    at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:65)
    at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:179)
    ... 140 common frames omitted

The relevant column definition in header/metadata.xml:

<column>
    <name>...</name>
    <type>DECIMAL(13, 2)</type>
    <typeOriginal>decimal</typeOriginal>
    <description>...</description>
</column>

The corresponding column definition in content/schema0/table5/table5.xsd:

<xs:element minOccurs="0" name="c19" type="xs:decimal"/>

An example entry in content/schema0/table5/table5.xml:

<c19>237869.25</c19>

Apparently dbptk tries to turn a floating point decimal into a long. Is that intentional?

Steps required to reproduce the bug:

  1. Create SIARD file as described above.
  2. Load into DBPTK Desktop.
  3. Try browsing.

Attach the dbptk-app.log.txt file below. → Could not find that file on my system, sorry ...

Cheers, Martin

luis100 commented 2 years ago

Hello, the error refers col18 col18_l'='237869.25' but your report of the datatype seems to refer c19, could you check the data again to ensure your report is correct?

marhop commented 2 years ago

Ah yeah, I thought that was just dbptk counting from zero. :-) Will check again.

luis100 commented 2 years ago

This may be a re-numbering column issue (expected when loading from SiardSuite into DBPTK Desktop), fixed on #316, to be released in 2.6.1.

luis100 commented 2 years ago

We do start with 0, and SIARD starts with 1. So your report might be correct. But this has not happened before. Please check if this report is correct. Also, if you could mock an example that would reproduce the issue it would help immensely.

marhop commented 2 years ago

Double-checked right now, there seems indeed to be an offset of 1 between the column numbers in the SIARD file and in the log ...

I'll try to put together a minimal example.

marhop commented 2 years ago

Well now, that was tricky.

I created a minimal example, and because I did not have the SIARD Suite at hand I just created it with DBPTK Desktop 2.6.0 - no problems at all, I could browse it perfectly fine (see attached file a.siard.zip, added the zip extension so GitHub would let me upload it).

So I compared my example file to the file created by the SIARD Suite (the one that raised the error above). Where DBPTK puts this into header/metadata.xml

<type>DECIMAL(13,2)</type>
<typeOriginal>DECIMAL(13,2)</typeOriginal>

the SIARD Suite writes this:

<type>DECIMAL(13, 2)</type>
<typeOriginal>decimal</type>

But contrary to what I thought first, it's not the obvious difference in the typeOriginal element that leads to problems - it's the space in (13, 2)! When I changed the entry to

<type>DECIMAL(13, 2)</type>
<typeOriginal>DECIMAL(13, 2)</typeOriginal>

that is, just added a space char in the type name (see attached file b.siard.zip), browsing the SIARD file raised the NumberFormatException.

I can't judge how lenient the type name parsing should be, but since the SIARD Suite wrote those spaces at least at one point in history (or still writes them, I haven't checked), maybe you could make your parser a little more flexible to increase compatibility?

PS: Again, there was an offset between the column number in the SIARD file (c1) and the one reported in the ~/.dbvtk/log/dbvtk.log file (col0_l).

luis100 commented 2 years ago

Parsing should be lenient, validation should be strict. This would then me marked as an enhancement, and maybe transferred back to the dbptk-developer as it is the part of the logic that parses SIARD into the intermediate data model.