opengeospatial / ets-csw202

Executable Test Suite for CSW 2.0.2
Other
2 stars 1 forks source link

Timezoned-date literals are not recommended for dc:date #6

Closed tastle closed 9 years ago

tastle commented 9 years ago

There are several tests in the GetRecords script that have have filters like:

<ogc:PropertyIsGreaterThan>
    <ogc:PropertyName>dc:date</ogc:PropertyName>
    <ogc:Literal>2004-01-01Z</ogc:Literal>
</ogc:PropertyIsGreaterThan>

The problem is that the literal isn't a valid date. Dates don't have timezones. Omitting the Z makes the date valid. The other option would be to do 2004-01-01TZ, but the spec seems to be pretty clear on using a date, not a date time.

Does anyone else have any thoughts on this?

lorebiga commented 9 years ago

Date values are encoded with the encoding scheme defined in XML Schema Part 2: Datatypes. That is, date literals in the tests are encoded (and may be parsed) as xsd:date instances. This is a most natural choice, since the tests already leverage XML Schema, XSL, etc.

Xsd:date instances may in fact have timezones (the concept is called "Timezoned date"), with the semantic defined in [http://www.w3.org/TR/xmlschema-2/#date].

I do not think this is in contrast with Dublin Core and the CSW schema.

The Dublic Core Metadata Element Set 1.1 specifies that the: Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF].

In the Dublic Core XSD (used by CSW 2.0.2), dc:date is implemented as a substitution group. The comment specifies that the: Recommended best practice for encoding the date value is defined in a profile of ISO 8601 and includes (among others) dates of the form YYYY-MM-DD.

tastle commented 9 years ago

I think you've hit the nail on the head.

ISO 8601 does not define a "Timezoned date". So parsing dates using an ISO 8601 compliant library will fail.

Thanks for the clarification.

lorebiga commented 9 years ago

I have tried to confirm that the syntax is invalid wrt ISO 8601, looking for a ISO 8601 parser (preferably online). What library are you using?

I think this aspect deserves to be clarified better in the tests documentation (and possibly in the specs), thanks for reporting.

tastle commented 9 years ago

@lorebiga, we are using JodaTime, eventually transitioning to Java 8 Date and Time API JSR-310.

So when we parse the date, it throws an exception.

It's funny though. All W3C DTF had to do to support Timezoned Dates, is to do

2004-01-01TZ

instead of

2004-01-01Z

and from what I've read, everything would be aligned. But instead it's a non-compatible ISO 8601 grammar. (I could be missing something though.)

Did you want me to reopen this issue?

Edit: After digging a bit some more, I don't believe my asserted hypothesis holds any water, so I've redacted it for future readers.

lorebiga commented 9 years ago

Yes, let's reopen. Could you help me confirm that the Timezoned Date syntax is non-compatible ISO 8601/W3C DTF? I'm not 100% sure either.

If that is the case, I think we should avoid it (hence remove the Z).

tastle commented 9 years ago

What a difficult task. With what I've read, I don't believe I'd wish any of this upon my worst enemy.

What I think I'm seeing is that the sample date values might be wrong because it's modelled as it should end up be formatted as in the ISO 19139 metadata model. I'm probably not going to get this all straight, but I'm going to try to explain my thought process as best as possible. So if I make a mistake, I hope you should be able to spot it.

The Dublin Core date definition looks to allow date time, which will allow carrying a time zone, but only if a time is provided. Please note their use of "such as". They recommend W3C DTF.

W3C DTF explicitly cites six allowed formats. None of these are a Timezoned Date. To test this, I used the Rome W3C DateTime Parser (com.sun.syndication.io.impl.DateParser). Now that library is an implementation (so it could have errors) but from what I see, it is behaving as the W3C would have expected it to behave.

If that is true, then the sample data with the timezoned dates could be corrected by modifying the dc:date value as follows:

<?xml version="1.0" encoding="UTF-8"?>
<csw:Record xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" 
  xmlns:ows="http://www.opengis.net/ows" 
  xmlns:dc="http://purl.org/dc/elements/1.1/" 
  xmlns:dct="http://purl.org/dc/terms/"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <dc:identifier>urn:uuid:784e2afd-a9fd-44a6-9a92-a3848371c8ec</dc:identifier>
    <dc:title>Aliquam fermentum purus quis arcu</dc:title>
    <dc:type>http://purl.org/dc/dcmitype/Text</dc:type>
    <dc:subject>Hydrography--Dictionaries</dc:subject>
    <dc:format>application/pdf</dc:format>
    <dc:date>2006-05-12T00:00Z</dc:date>
    <dct:abstract>Vestibulum quis ipsum sit amet metus imperdiet vehicula. Nulla scelerisque cursus mi.</dct:abstract>
</csw:Record>

Now the CSW that I'm testing against is backed by the CSW ISO profile. That means that my metadata is stored in a manner that must comply with ISO 19139. The date value provided in the sample data (as csw:Records) are mapped to a gco:Date element as per specification.

OGC 07-045 ISO Metadata Application Profile Page 41

Modified (c) Date on which the record was created or updated within the catalogue Date-8601, example: 1963-06-19 MD_Metadata.dateStamp.Date (c) DCMI metadata term http://dublincore.org/documents/dcmi-terms/.

<xs:simpleType name="Date_Type">
    <xs:union memberTypes="xs:date xs:gYearMonth xs:gYear"/>
</xs:simpleType>
<xs:element name="Date" type="gco:Date_Type" nillable="true"/>

I believe this is a really bad choice of element, because of the limitation of accuracy of a gco:Date. There isn't enough accuracy to represent when a metadata record was changed during a day, so changes within the window of a day cannot be represented, but I digress. Regardless, this mapping ends up being lossy. A gco:Date is more constrained than a dc:date. A gco:DateTime wouldn't have been lossy.

So now I'm in the situation of taking the csw:Record sample data and converting it to valid ISO 19139 documents so I can run these tests and ensure that they pass. At this point, I have the option to shorten it to 2006-05-12Z. But that really depends on the data model backing the CSW. I'm not familiar with ebRIM, but it would be a to-each-their-own situation, as they'd need to prepare data values according to their underlying data model.

So in summary, it looks like:

  1. dc:date values in the sample data are incorrect.
  2. If the sample data is mapped into ISO 19139, it can support the YYYY-MM-DDZ format because of xs:date, but that would be the choice of the implementer as to how they choose to do that.
  3. The literal values in the tests are incorrect because they don't comply with W3C DTF. They'd have to be updated to have a valid string if timezones are desired for those tests.

What are your thoughts?

PS. I was somewhat happy to have stumbled onto this developer question on stackoverflow that seems to back up the incompatibility between ISO 8601 and xs:date. I wish I had found it sooner, but at least it seems to back up my reasoning.

Here is a Maven project with some tests demonstrating the behaviour of a few libraries with W3C DTF and xs:date date formats. https://github.com/tastle/DateTest

lorebiga commented 9 years ago

hi @tastle, thanks for looking into this. I'm not sure I follow your reasoning entirely, but I think we are getting to a solution here. I've a question: why do you say that the Dublin Core date definition allows carrying a time zone only if a time is provided? I wasn't able to spot that precise requirement in the definitions. Hence, I wouldn't conclude that dc:date values in the sample data are incorrect.

tastle commented 9 years ago

@lorebiga , I think I got there because of:

"Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF]."

The formats are as follows. Exactly the components shown here must be present, with exactly this punctuation. Note that the "T" appears literally in the string, to indicate the beginning of the time element, as specified in ISO 8601.

Year: YYYY (eg 1997) Year and month: YYYY-MM (eg 1997-07) Complete date: YYYY-MM-DD (eg 1997-07-16) Complete date plus hours and minutes: YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00) Complete date plus hours, minutes and seconds: YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00) Complete date plus hours, minutes, seconds and a decimal fraction of a second YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00) where:

 YYYY = four-digit year
 MM   = two-digit month (01=January, etc.)
 DD   = two-digit day of month (01 through 31)
 hh   = two digits of hour (00 through 23) (am/pm NOT allowed)
 mm   = two digits of minute (00 through 59)
 ss   = two digits of second (00 through 59)
 s    = one or more digits representing a decimal fraction of a second
 TZD  = time zone designator (Z or +hh:mm or -hh:mm)

So to answer your question:

why do you say that the Dublin Core date definition allows carrying a time zone only if a time is provided?

I believe the dublin core recommends using W3CDTF and W3CDTF does not support a timezoned date format, as you can see quoted above. There are six valid formats.

Does that help?

lorebiga commented 9 years ago

Well, as far as tests are concerned, we really need to be picky on the wording and avoid over-restrictive interpretations: we cannot assume a specification requires something, unless it actually does it normatively.

In this case, the use of W3C DTF is only recommended by DC, so we cannot make it mandatory. The literals in the tests use the XSL encoding, which is legitimate, hence we can't call them invalid, or bugged.

All this said, I think following recommendations improves interoperability, and it is fair to protect who abides to them (in this case, W3C DTF compliant services). So, I'm flagging this as an enhancement and removing the 'Z' from the tests.

If this is ok for you, I think we can close this issue.

tastle commented 9 years ago

Yes, I'm okay with closing this issue. I think your proposal makes sense, which is what had me confused.

I really appreciate your help and time @lorebiga. It's nice to see people actively be available to help where there are problems or questions with these tests.

Best wishes!

lorebiga commented 9 years ago

thanks for the nice words @tastle! And thanks for taking the time to report on this subtle issue, and to follow up! To next time!

tomkralidis commented 9 years ago

@lorebiga we need to update the test data to reflect this as well, in these files:

Record_784e2afd-a9fd-44a6-9a92-a3848371c8ec.xml:    <dc:date>2006-05-12Z</dc:date>
Record_94bc9c83-97f6-4b40-9eb8-a8e8787a5c63.xml:    <dc:date>2006-03-26Z</dc:date>
Record_9a669547-b69b-469f-a11f-2d875366bbdc.xml:    <dc:date>2005-10-24Z</dc:date>
Record_e9330592-0932-474b-be34-c3a3bb67c7db.xml:    <dc:date>2003-05-09Z</dc:date>
tomkralidis commented 9 years ago

FYI this also needs to be update here: https://github.com/opengeospatial/ets-csw202/blob/faf2d8ab55d75afbd9fd94c5dc26d04cf107fb13/src/main/scripts/ctl/GetRecords/CSW-GetRecords-POST.xml#L1150-L1151

If it helps, I can submit a pull request.

lorebiga commented 9 years ago

hi @tomkralidis, I had made some changes when I saw your pull... I'm reverting and getting yours. Thanks! :+1: