lucmoreau / ProvToolbox

Java toolkit to create and convert W3C PROV data model representations, and build provenance-enabled applications in a variety of programming languages (java, python, typescript, javascript)
Other
73 stars 42 forks source link

Timezone information lost during deserialization #209

Closed mf-16 closed 9 months ago

mf-16 commented 9 months ago

I encountered an issue with ProvToolbox version 2.0.0 while working with time. When deserializing a document using the provided code:

var inf = new InteropFramework();
var document = inf.readDocumentFromFile("file.provn");

file.provn:

document
activity(prov:a,2023-09-08T20:12:45.109-04:00,2023-09-15T20:35:06.793-04:00)
endDocument

After Deserialization and serializing it back to provn:

document
activity(prov:a,2023-09-09T00:12:45.109+02:00,2023-09-16T00:35:06.793+02:00)
endDocument

we lose information about timezone, and the timezone we get now is systems timezone.

This issue seems to occur in the ProvFactory class, specifically in the newISOTime method where the timezone information is lost during the execution of:

public XMLGregorianCalendar newISOTime(String time) {
        return this.newTime(DatatypeConverter.parseDateTime(time).getTime());
}

more specifically here:

DatatypeConverter.parseDateTime(time).getTime()

This issue impacts applications relying on accurate timezone data and could lead to incorrect data representation.

lucmoreau commented 9 months ago

2023-09-08T20:12:45.109-04:00 and 2023-09-09T00:12:45.109+02:00 denote the same time, but they are expressed according to different time zones.

PROV does not specify a Document's “default timezone” according to which dates have to be serialized (unlike namespace prefix which can be defined in a Document)

I am not aware of an obligation set by PROV to reexport dates in the same timezones as those they were imported in.

lucmoreau commented 9 months ago

Please reopen the issue, if the above interpretation is not correct.

stain commented 9 months ago

If it can't preserve the tz, then it should perhaps normalize to UTC (Z) not into the locale timezone of the environment that provconvert is running, otherwise the provenance of the prov conversion becomes important as well..

lucmoreau commented 9 months ago

The above commit is quick fix for ProvToolbox, offering a new factory method to create dates, and keep their original timezone offsets, instead of converting to the default system timezone offset.

@mf-16 does it address your concern?

lucmoreau commented 9 months ago

The online translator, however, has not changed. If you paste the following example in https://openprovenance.org/service/translator.html (selecting provn notation), the result will display the same provenance but with both dates expressed with the default timezone offset (London time, at this time of the year, GMT+1). Same in provconvert from the command line.

document
prefix ex <https://example.org/>
activity(ex:a,2023-09-08T20:12:45.109-04:00,2023-10-15T20:35:06.793-02:00)
endDocument
lucmoreau commented 9 months ago

Following @stain 's suggestion, there is now a constructor to create the date in normalized form (In UTC timezone). PROV-N parser was updated to support it. The other serializations json/jsonld/provx seem to normalize dates in UTC.

Now, the following (on my development branch) reads provn and exports with dates in UTC.

curl https://gist.githubusercontent.com/lucmoreau/588fdaeca5eb271cc6d0cd86816bea00/raw/ff621ab4bbb07d20585331be2c434e4bd575a8c8/date_with_tz_offset.provn | modules-executable/toolbox/target/appassembler/bin/provconvert -infile - -informat provn -outfile - -outformat provn
lucmoreau commented 9 months ago

For provtoolbox to preserve the original timezone offset when run on the command line requires a bit more effort.

mf-16 commented 9 months ago

The above commit is quick fix for ProvToolbox, offering a new factory method to create dates, and keep their original timezone offsets, instead of converting to the default system timezone offset.

@mf-16 does it address your concern?

Yes, it does. I appreciate you providing fix for the issue and your quick response, @lucmoreau. Thank you!

lucmoreau commented 9 months ago

@stain and @mf-16: your comments gave me food for thought. provconvert now allows users to specific how timezone offset is to be processed, PRESERVE, UTC, SYSTEM or a specific timezone. A web client can also specific this to the provapi, by means of headers.