keeps / dbptk-developer

DBPTK Developer - library and command-line tool for execution of database preservation actions
http://www.database-preservation.com
GNU Lesser General Public License v3.0
43 stars 19 forks source link

SIARD validation error on requirement T_6.3-1 #505

Open oschihin opened 2 years ago

oschihin commented 2 years ago

Description:

SIARD Version is 2.1, file was produced through SiardGui 2.1.134. Database was MS SQL Server 12.

A SIARD validation with dbptk results in an unexpected error on requirement T_6.3-1 "Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. This restriction is enforced in the definitions of dateType and dateTimeType."

The validation report shows more details:

T_6.3 - Date and timestamp data cells
    T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table3/table3.xsd restriction not enforced
    T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table6/table6.xsd restriction not enforced
    T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table9/table9.xsd restriction not enforced
Date and timestamp data cells [FAILED]

The schemas mentioned define the types as below:

  <!-- date type between 0001 and 9999 -->
  <xs:simpleType name="dateType">
    <xs:restriction base="xs:date">
      <xs:minInclusive value="0001-01-01"/>
      <xs:maxExclusive value="10000-01-01"/>
      <xs:pattern value="\d{4}-\d{2}-\d{2}Z?"/>
    </xs:restriction>
  </xs:simpleType>

  <!-- time type restricted to UTC -->
  <xs:simpleType name="timeType">
    <xs:restriction base="xs:time">
      <xs:pattern value="\d{2}:\d{2}:\d{2}(\.\d+)?Z?"/>
    </xs:restriction>
  </xs:simpleType>

  <!-- dateTime type between 0001 and 9999 restricted to UTC -->
  <xs:simpleType name="dateTimeType">
    <xs:restriction base="xs:dateTime">
      <xs:minInclusive value="0001-01-01T00:00:00.000000000Z"/>
      <xs:maxExclusive value="10000-01-01T00:00:00.000000000Z"/>
      <xs:pattern value="\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?Z?"/>
    </xs:restriction>
  </xs:simpleType>

I have questions on that result:

Thanks for helping me out.

luis100 commented 2 years ago

It seems to me that the validator is expecting the dateType min and max values to have the GMT timezone defined.

      <xs:minInclusive value="0001-01-01Z"/>
      <xs:maxExclusive value="10000-01-01Z"/>

https://github.com/keeps/dbptk-developer/blob/fae47ccc31c6a9392910c3769d65e0c1c274d79e/dbptk-modules/dbptk-module-siard/src/main/java/com/databasepreservation/modules/siard/validate/component/tableData/DateAndTimestampDataValidator.java#L38-L39

oschihin commented 2 years ago

@luis100 thanks for your quick reply. This seems to explain my first question. Any idea why it fails only on a selection of tables?

This would probably have to be corrected in the SIARD producing software, i.e. SIARD GUI, right?

luis100 commented 2 years ago

My comment is only an initial assessment, for a proper analysis of the issue we would need the SIARD that has the issue to check why it fails only in a selection.

In terms of fixing, I am not sure if this is an issue of the SIARD producing software or of the vallidator that is being too strict. We need to go back to the specification to see if timezone should be considered in these cases, but it already seems odd to me that timezone is considered in the dateTimeType restrictions and not on the dateType, as it would affect proper comparison in the same way.

So, I'll leave the issue still open for further discussion. If you can provide the SIARD or a SIARD with mocked data that shows the same behaviour it would help to better assess the issue.

oschihin commented 2 years ago

The SIARD is pretty sensitive, so I cannot hand that out. I'll see If there's a way to produce a mockup.