keeps / dbptk-developer

DBPTK Developer - library and command-line tool for execution of database preservation actions
http://www.database-preservation.com
GNU Lesser General Public License v3.0
43 stars 19 forks source link

MS Access: '' shown for empty cells in PostgreSQL #237

Closed BorisDomajnko closed 8 years ago

BorisDomajnko commented 8 years ago

Description: MS Access db has a field Short text where one character is entered. After export&restore the empty fields show ''.

Steps required to reproduce the bug:

  1. use 2.0.0.beta5 and extract the db (file Prs1.accdb) The values in column are single characters like A, B, C, or nothing.
  2. restore to Postgresql
  3. the result is Character varying (1), but the empty cells are (in PgAdmin III) displayed as '' (two single quotation marks or apostrophes) and cells with values are correct (A, B, C ...)
  4. This problem occurs with column ULICA_HS_D
  5. This problem does not show with column ZAV_ZDIJZ in the same table!

Attach the dbptk-app.log.txt file below.

2016-08-02 10:37:55,847 [main] DEBUG (c.d.m.Reporter) dataTypeChangedOnImport, invoker: com.databasepreservation.modules.msAccess.in.MsAccessUCanAccessDatatypeImporter; message: - Type conversion in import module: in PUBLIC.PRS_ENOTA_RS.ULICA_HS_D (format: schema.table.column) has original type VARCHAR and was converted to the standard type CHARACTER VARYING(1); and type: Type{description='null', originalTypeName='VARCHAR', sql99TypeName='CHARACTER VARYING(1)', sql2008TypeName='CHARACTER VARYING(1)'} 2016-08-02 10:37:55,847 [main] DEBUG (c.d.m.j.i.JDBCImportModule) Data type: 12 Type name: VARCHAR Column size: 1 Decimal digits: 0 Radix: 0 Remarks: null Default value: null Index: 16 Is Nullable: true Is auto increment: false Calculated type: SimpleTypeString ColumnType hash: 1731377498

2016-08-02 10:37:55,847 [main] DEBUG (c.d.m.Reporter) dataTypeChangedOnImport, invoker: com.databasepreservation.modules.msAccess.in.MsAccessUCanAccessDatatypeImporter; message: - Type conversion in import module: in PUBLIC.PRS_ENOTA_RS.ZAV_ZDIJZ (format: schema.table.column) has original type VARCHAR and was converted to the standard type CHARACTER VARYING(1); and type: Type{description='null', originalTypeName='VARCHAR', sql99TypeName='CHARACTER VARYING(1)', sql2008TypeName='CHARACTER VARYING(1)'} 2016-08-02 10:37:55,847 [main] DEBUG (c.d.m.j.i.JDBCImportModule) Data type: 12 Type name: VARCHAR Column size: 1 Decimal digits: 0 Radix: 0 Remarks: null Default value: null Index: 38 Is Nullable: true Is auto increment: false Calculated type: SimpleTypeString ColumnType hash: 1731377498

chalkos commented 8 years ago

The values in column are single characters like A, B, C, or nothing

This "nothing" that you speak of, is it a NULL, an empty string or a space?

BorisDomajnko commented 8 years ago

"Nothing" in Access column means the field has not been edited yet (no character has been entered by keyboard). I do not now if this is NULL or string with length 0.

Both fields from the example above are configured with: Allow Zero Length: Yes If I export as XML, this is the definition in the exported xsd:

Column in XML:

<ULICA>Kaplja vas</ULICA>
<ULICA_HS>  10</ULICA_HS>
<ULICA_HS_D></ULICA_HS_D>

Columns in xsd:

<xsd:element name="ULICA_HS_D" minOccurs="0" od:jetType="text" od:sqlSType="nvarchar">
<xsd:annotation>
<xsd:appinfo>
<od:fieldProperty name="ColumnWidth" type="3" value="-1"/>
<od:fieldProperty name="ColumnOrder" type="3" value="0"/>
<od:fieldProperty name="ColumnHidden" type="1" value="0"/>
<od:fieldProperty name="Required" type="1" value="0"/>
<od:fieldProperty name="AllowZeroLength" type="1" value="1"/>
<od:fieldProperty name="DisplayControl" type="3" value="109"/>
<od:fieldProperty name="IMEMode" type="2" value="0"/>
<od:fieldProperty name="IMESentenceMode" type="2" value="3"/>
<od:fieldProperty name="UnicodeCompression" type="1" value="0"/>
<od:fieldProperty name="TextAlign" type="2" value="0"/>
<od:fieldProperty name="AggregateType" type="4" value="-1"/>
<od:fieldProperty name="ResultType" type="2" value="0"/>
<od:fieldProperty name="CurrencyLCID" type="4" value="0"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>

<xsd:element name="ZAV_ZDIJZ" minOccurs="0" od:jetType="text" od:sqlSType="nvarchar">
<xsd:annotation>
<xsd:appinfo>
<od:fieldProperty name="ColumnWidth" type="3" value="-1"/>
<od:fieldProperty name="ColumnOrder" type="3" value="0"/>
<od:fieldProperty name="ColumnHidden" type="1" value="0"/>
<od:fieldProperty name="Required" type="1" value="0"/>
<od:fieldProperty name="AllowZeroLength" type="1" value="1"/>
<od:fieldProperty name="DisplayControl" type="3" value="109"/>
<od:fieldProperty name="IMEMode" type="2" value="0"/>
<od:fieldProperty name="IMESentenceMode" type="2" value="3"/>
<od:fieldProperty name="UnicodeCompression" type="1" value="0"/>
<od:fieldProperty name="TextAlign" type="2" value="0"/>
<od:fieldProperty name="AggregateType" type="4" value="-1"/>
<od:fieldProperty name="ResultType" type="2" value="0"/>
<od:fieldProperty name="CurrencyLCID" type="4" value="0"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
chalkos commented 8 years ago

I edited your comment to format the code as such (using ```).

From the XSD I can see that the text field can have length 0, but it can not be NULL (omitted). So it makes total sense that after export&restore the value is '' (an empty string). This is not a bug in DBPTK, as it is correctly extracting the values from the database.

I can not make DBPTK change this information, as having an empty string can be different (in both the database and application logic) from not having a value at all.

Should I close this?

BorisDomajnko commented 8 years ago

Please check my mail from 2.Aug.2016 with screen snapshot. The issue is that when there is no value entered in the source database field, in the target database we get two characters: ' and ' instead of "nothing". You should not see ' as data unless somebody has actualy typed it.

chalkos commented 8 years ago

You should not see ' as data unless somebody has actually typed it.

Yes, you are right, that is still an issue. I will now try to guide you so you can tell me if this problem is present in the SIARD file or somewhere else (probably the PostgreSQL export module or the PostgreSQL viewer).

Lets strat by trying to find the value in the SIARD file. It is in a tableN.xml file, but there are a few of those inside the SIARD. To know the exact folder name, check the header/metadata.xml file (inside the SIARD) and look for the PRS_ENOTA_RS table inside the public schema. There should be a <folder> tag with the folder name.

Then open the file at content/<schema folder tag>/<table folder tag>/<table folder tag>.xml (example: if the schema has folder schema1 and table has folder table5 then the XML file is located at content/schema1/table5/table5.xml) and try to find this row:

<ULICA>Kaplja vas</ULICA>
<ULICA_HS>  10</ULICA_HS>
<ULICA_HS_D></ULICA_HS_D>

which should be something like

<c1>Kaplja vas</c1>
<c2>  10</c2>
<c3></c3>

(although the numbers may not be the same).

if you find an empty tag for the ULICA_HS_D column, like this <c3></c3> or like this <c3/>, it means that SIARD is correctly saving the value as a zero-length string. If you find something like <c3>''</c3>, then the problem is present in the SIARD file.

BorisDomajnko commented 8 years ago

It is c16. For Kaplja vas 10 we have '''xml

Kaplja vas
<c15>\u0020\u002010</c15>
<c16/>

''' Is it possible that DBPTK changes when populating a PostgreSQL table?

chalkos commented 8 years ago

Please put the xml code between ``` so it is correctly formatted.

BorisDomajnko commented 8 years ago
    <c14>Kaplja vas</c14>
    <c15>\u0020\u002010</c15>
    <c16/>
chalkos commented 8 years ago

Then the information is correct in SIARD.

What is the application that displays the value as ''? Is it the PHP dbviewer or PostgreSQL?

BorisDomajnko commented 8 years ago

pgAdmin III (table browser). But if I export table as text, a get correctly: Kaplja vas\t 10\t\t.

chalkos commented 8 years ago

Then it is not a DBPTK problem. pgAdmin adds the ' as a visual clue that there is a value in there, but it has length zero.

I don't think there is anything that DBPTK can do about that. Should I close the issue?

BorisDomajnko commented 8 years ago

I agree. I cannot provide more information at this point of time except that this behavior of pgAdmin has not been noticed with other tests or applications.