keeps / dbptk-developer

DBPTK Developer - library and command-line tool for execution of database preservation actions
http://www.database-preservation.com
GNU Lesser General Public License v3.0
43 stars 19 forks source link

SIARD-DK: When exporting table meta data, we must change 'typeOriginal' accordingly, if type is changed (CLOBS or BLOBS) #128

Open ThomasKristensen opened 8 years ago

ThomasKristensen commented 8 years ago

@andreaskring When exporting columns with CLOBS or BLOBS, we change the column type to integer (as it is supposed to identify the file with the binary data). When we do this transformation, we also need to change to 'orginalType' value, as the import mechanism sometimes relies on the original type for determining the type of the columns.

https://github.com/magenta-aps/db-preservation-toolkit/blob/dev/dbptk-core/src/main/java/com/databasepreservation/modules/siard/out/metadata/TableIndexFileStrategy.java#L124

Example:

Orginal Mysql Table:

CREATE TABLE `datatypes` (
  `col1` blob COMMENT 'Test comment',
  `col_key` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Comment Primary Key Column',
  PRIMARY KEY (`col_key`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;

Table metadata in generated SIARD-DK archive:

<table>
            <name>datatypes</name>
            <folder>table1</folder>
            <description>Description should be entered manually</description>
            <columns>
                <column>
                    <name>col1</name>
                    <columnID>c1</columnID>
                    <type>INTEGER</type>
                    <typeOriginal>BLOB</typeOriginal>
                    <nullable>true</nullable>
                    <description>Test comment</description>
                    <functionalDescription>Dokumentidentifikation</functionalDescription>
                </column>
                <column>
                    <name>col_key</name>
                    <columnID>c2</columnID>
                    <type>INTEGER</type>
                    <typeOriginal>INT</typeOriginal>
                    <nullable>false</nullable>
                    <description>Comment Primary Key Column</description>
                </column>
            </columns>
            <primaryKey>
                <name>PRIMARY</name>
                <column>col_key</column>
            </primaryKey>
            <rows>1</rows>
        </table>

We this archive is imported in MySQL, this will result in the 'col1' column getting datatype 'blob', which is not what we want, as the data will now be integers (ids of files with the exported blobs in the archive).

andreaskring commented 8 years ago

I will fix this

2016-02-15 14:33 GMT+01:00 Thomas Kristensen notifications@github.com:

@andreaskring https://github.com/andreaskring When exporting columns with CLOBS or BLOBS, we change the column type to integer (as it is supposed to identify the file with the binary data). When we do this transformation, we also need to change to 'orginalType' value, as the import mechanism sometimes relies on the original type for determining the type of the columns.

https://github.com/magenta-aps/db-preservation-toolkit/blob/dev/dbptk-core/src/main/java/com/databasepreservation/modules/siard/out/metadata/TableIndexFileStrategy.java#L124

Example:

Orginal Mysql Table:

CREATE TABLE datatypes ( col1 blob COMMENT 'Test comment', col_key int(11) NOT NULL AUTO_INCREMENT COMMENT 'Comment Primary Key Column', PRIMARY KEY (col_key) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;

Table metadata in generated SIARD-DK archive:

datatypestable1Description should be entered manually col1 c1 INTEGER BLOB true Test comment Dokumentidentifikation col_key c2 INTEGER INT false Comment Primary Key Column PRIMARY col_key 1

We this archive is imported in MySQL, this will result in the 'col1' column getting datatype 'blob', which is not what we want, as the data will now be integers (ids of files with the exported blobs in the archive).

— Reply to this email directly or view it on GitHub https://github.com/keeps/db-preservation-toolkit/issues/128.

Andreas Kring Magenta Aps Incuba, Åbogade 15 DK-8200 Aarhus N

tel +45 3336 9699 dir +45 8177 1698

http://www.magenta.dk andreas@magenta.dk

ThomasKristensen commented 8 years ago

@andreaskring When 'typeOriginal' is handled, consider adjusting the test here: https://github.com/magenta-aps/db-preservation-toolkit/blob/siarddk-import-module/dbptk-core/src/test/java/com/databasepreservation/testing/integration/siard/SiardDKTest.java#L157

chalkos commented 8 years ago

Some parts of dbptk relied on the "original type", but this has proven to be a bad practice. Sure, it easily solves some problems for now, but in the long run relying on the original type will cause a lot of trouble due to the variety of values that can be present in that field. The preferred way to recognise types is to use the SQL standard type (99, 2008, etc). This may lead to some type changes (ie, text fields becoming CLOBs) but that may be acceptable as long as no data is lost (for that the new type must be able to contain all values that could be represented in the original type).