Open ThomasKristensen opened 8 years ago
I will fix this
2016-02-15 14:33 GMT+01:00 Thomas Kristensen notifications@github.com:
@andreaskring https://github.com/andreaskring When exporting columns with CLOBS or BLOBS, we change the column type to integer (as it is supposed to identify the file with the binary data). When we do this transformation, we also need to change to 'orginalType' value, as the import mechanism sometimes relies on the original type for determining the type of the columns.
Example:
Orginal Mysql Table:
CREATE TABLE
datatypes
(col1
blob COMMENT 'Test comment',col_key
int(11) NOT NULL AUTO_INCREMENT COMMENT 'Comment Primary Key Column', PRIMARY KEY (col_key
) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;Table metadata in generated SIARD-DK archive:
datatypes table1 Description should be entered manually col1 c1 INTEGER BLOB true Test comment Dokumentidentifikation col_key c2 INTEGER INT false Comment Primary Key Column PRIMARY col_key 1 We this archive is imported in MySQL, this will result in the 'col1' column getting datatype 'blob', which is not what we want, as the data will now be integers (ids of files with the exported blobs in the archive).
— Reply to this email directly or view it on GitHub https://github.com/keeps/db-preservation-toolkit/issues/128.
Andreas Kring Magenta Aps Incuba, Åbogade 15 DK-8200 Aarhus N
tel +45 3336 9699 dir +45 8177 1698
http://www.magenta.dk andreas@magenta.dk
@andreaskring When 'typeOriginal' is handled, consider adjusting the test here: https://github.com/magenta-aps/db-preservation-toolkit/blob/siarddk-import-module/dbptk-core/src/test/java/com/databasepreservation/testing/integration/siard/SiardDKTest.java#L157
Some parts of dbptk relied on the "original type", but this has proven to be a bad practice. Sure, it easily solves some problems for now, but in the long run relying on the original type will cause a lot of trouble due to the variety of values that can be present in that field. The preferred way to recognise types is to use the SQL standard type (99, 2008, etc). This may lead to some type changes (ie, text fields becoming CLOBs) but that may be acceptable as long as no data is lost (for that the new type must be able to contain all values that could be represented in the original type).
@andreaskring When exporting columns with CLOBS or BLOBS, we change the column type to integer (as it is supposed to identify the file with the binary data). When we do this transformation, we also need to change to 'orginalType' value, as the import mechanism sometimes relies on the original type for determining the type of the columns.
https://github.com/magenta-aps/db-preservation-toolkit/blob/dev/dbptk-core/src/main/java/com/databasepreservation/modules/siard/out/metadata/TableIndexFileStrategy.java#L124
Example:
Orginal Mysql Table:
Table metadata in generated SIARD-DK archive:
We this archive is imported in MySQL, this will result in the 'col1' column getting datatype 'blob', which is not what we want, as the data will now be integers (ids of files with the exported blobs in the archive).