jdum / odfdo

python library for OpenDocument format (ODF)
Apache License 2.0
53 stars 12 forks source link

`DateTime.decode` doesn't handle timezones properly #47

Closed GPHemsley-RELX closed 3 weeks ago

GPHemsley-RELX commented 3 weeks ago

The ODF spec defines date as being encoded as "Date value as specified in §3.2.9 of [xmlschema-2], or date and time value as specified in §3.2.7 of [xmlschema-2]".

The XSD spec is a confusing mess, but it defines (optional) timezones (for both values) as (('+' | '-') hh ':' mm) | 'Z'. Note that 'Z' is equal to '+00:00' and '-00:00', and that the colon is not optional.

Previous versions of DateTime.decode simply chopped off the 'Z' if it was present and would have failed on any other timezone, but d8c0962 introduced a regression that made things worse by adding an invalid timezone of "+0000" that also was not expected by DATETIME_FORMAT.

This means that instead of silently dropping a timezone of 'Z' from a date value, DateTime.decode is now replacing it with additional text that doesn't parse, resulting in the following error:

ValueError: unconverted data remains: +0000
jdum commented 3 weeks ago

Thanks for the report, see recent version v3.7.13 that should fix this bug.

GPHemsley-RELX commented 3 weeks ago

I'm not entirely sure it covers everything but it does fix my current usecase. Thanks for the quick turnaround!