keeps / dbptk-ui

DBPTK base UI for both Desktop and Enterprise
https://database-preservation.com
GNU Lesser General Public License v3.0
23 stars 9 forks source link

Unable to load 92MB file with 5 tables #331

Open gillianh1 opened 1 year ago

gillianh1 commented 1 year ago

Description: Generated a file using DBPTK desktop. Contains 5 tables and the file is 92MB. When try to open file in DBPTK desktop a blue progress dot pulses on the open option but the file never loads.

Context: DBPTK Desktop: Installed on Windows 10 PC Using dbptk-desktop-2.6.0.exe

Steps required to reproduce the bug:

  1. Generated a file using DBPTK desktop. Contains 5 tables and the file is 92MB.
  2. When try to open file in DBPTK desktop a blue progress dot pulses on the open option but the file never loads. ( a smaller file of 2MB with 2 tables does load successfully)
  3. I tried increasing memory in settings. But still unable to load the file.
  4. Have we reached the limitations of DBPTK desktop or running this on a Windows PC?

Is there any documentation on hardware/sizing requirements or limitations?

image

hmiguim commented 1 year ago

Hi,

Please attach the log files to better understand the problem. Logs are available in the menu Help -> Logs

gillianh1 commented 1 year ago

The file was created successfully using 2.6.0 but we where unable to open using 2.6 We have since been able to connect to same database and user using version 2.6.1 and have been able to create a new extract file and open the 92MB file. We are however still unable to load the original file created using version 2.6 in 2.6.1 desktop exe. We are able to open the new file created in version 2.6.1 using version 2.6 desktop exe. I will upload the log

gillianh1 commented 1 year ago

dbvtk.log Latest failed attempt at 11:47

luis100 commented 1 year ago

This seems to be the issue, a non-hex character in input.

2022-09-16 11:47:58,262 [http-nio-auto-1-exec-7] ERROR o.a.solr.handler.RequestHandlerBase - org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: o
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: o
    at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:212)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:333)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
    at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227)
    at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
    at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1003)
    at com.databasepreservation.common.server.index.utils.SolrUtils.find(SolrUtils.java:155)
    at com.databasepreservation.common.server.index.DatabaseRowsSolrManager.find(DatabaseRowsSolrManager.java:178)
    at com.databasepreservation.common.api.v1.DatabaseResource.getViewerDatabaseIndexResult(DatabaseResource.java:97)
    at com.databasepreservation.common.api.v1.DatabaseResource.find(DatabaseResource.java:71)
luis100 commented 1 year ago

Generally, the XML might be malformed, it started using an Unicode escape sequence but then put an "o" instead of a number. So you must look into the SIARD content to see where this came from.

gillianh1 commented 1 year ago

The SIARD file was produced using DBPTK Desktop (Using dbptk-desktop-2.6.0.exe)

No error was received when file was produced. So how would we know there was an issue with the file? Do we always need to open and validate the file. Can we not assume a file is OK if SIARD file created without error?

If rename the SIARD file with a .zip extension we can navigate the files.

We have subsequently create a new file using dbptk-desktop-2.6.1.exe pointing to the same user an database and this file is OK so it is not an issue with the tables/data being extracted from the database.

I will try generating the file again from 2.6.0 Desktop version to see if can reproduce the issue.

gillianh1 commented 1 year ago

I was able to extract, import and validate the file in version 2.6. image This time the file does open. I have access to both files and both files are the same size. I saved both files as .zip and was able to navigate all files/tables. I will attach the log.

gillianh1 commented 1 year ago

Latest log

dbvtk.log

Original file from 2.6 will not load (uoesiardschema_extract.siard) New file from 2.6 will load (2.6_uoesiardschema_extract.siard)

hmiguim commented 1 year ago

Hi @gillianh1 thank you for using and testing DBPTK and your feedback. Since version 2.6.1 is working fine I suggest you using that version instead of 2.6.0.

gillianh1 commented 1 year ago

This is what I plan to do. My only concern is that a file that was produced without error yet it cannot be opened. I would not like to be in this position when try to open a SIARD file in the future.

Is your recommendation to create, open and validate each file that is produced before archiving?

Thanks

hmiguim commented 1 year ago

The validation step is essential to have a proof that the produced SIARD is following the specification.

To ensure that no record is lost you can use a module called Merkle Tree filter documentation available here. However this requires to have a stored procedure that calculates the hash for every column exported using the Merkle tree top hash algorithm.

DBPTK offers you a set of tools to validate and verify completeness and correctness. And as a rule of thumb you should create, open and validate to see if the extract process went well.

gillianh1 commented 1 year ago

Thank you for you help and confirmation.