keeps / dbptk-developer

DBPTK Developer - library and command-line tool for execution of database preservation actions
http://www.database-preservation.com
GNU Lesser General Public License v3.0
43 stars 19 forks source link

Mime type and blob content #317

Closed CostantinoLandino closed 7 years ago

CostantinoLandino commented 7 years ago

I wanted to submit a problem with the database preservation toolkit. I used version 2.0.0 to convert an access database 2007 to siard 2.0 format without problems. In this database, i have a single table with a blob field. After export to siard-2 , I tried to unzip the siard file to view the binary file.

I tried to analyze with jhove o pronom to recognize the mime type withous success. I didn't find this information in metadata. This is a problem for long-term conservation. Someone tried to fix the problem.

Thank you

Costantino Landino

luis100 commented 7 years ago

Hi @CostantinoLandino, the DBPTK exports blobs (binary files) but cannot ensure their preservation, as anything could be there. So it only focuses on the relational database part, but allows to export blobs outside of the SIARD package, so they can be analysed in isolation.

So I guess now you have a archival forensics problem, identifying the file format is the first step. I recommend you use several tools for file format identification, like DROID, Siegfried, FIDO, FITS, Apache Tika, Unix file.

If none of the tool can't identify the format, maybe you have to revert to analyzing the binary code of the file, to try to find clues of what format it has and what information it contains, with tools like Bless or HxD.

CostantinoLandino commented 7 years ago

Thank you so much Luis for your answer,

I converted many databases (from access to postgresql) with your excellent tool.

I'm working on long term preservation for cultural heritage domain and the database preservation is one of the scenarios which i'm working on.

My focus is oriented to ensure a long term preservation of Archival Description and their digital object. I'm using Premis metadata to memorize action as format identification of object. Thank you for your suggestions. Their give me to opportunity to improve this aspect.

I'll follow your work on dbptk and dbvtk.

Costantino Landino