xuri / xgen

XSD (XML Schema Definition) parser and Go/C/Java/Rust/TypeScript code generator
BSD 3-Clause "New" or "Revised" License
318 stars 75 forks source link

Go: `gDay` cannot be parsed as `time.Time` #13

Closed BatmanAoD closed 2 years ago

BatmanAoD commented 4 years ago

Description

The XSD gDay type is not a timestamp but a recurring calendar day. Valid values given as examples in RELAX NG by Eric van der Vlist are:

---01, ---01Z, ---01+02:00, ---01-04:00, ---15, and ---31

To reproduce the issue:

Generate schema for xsd file with gDay type (such as test/xsd/base64.xsd)

Describe the results you received:

The Go type used in the generated schema to deserialize gDay elements is time.Time

Describe the results you expected:

The Go type used can be deserialized from a valid gDay string, such as the examples above

alexandre-normand commented 2 years ago

I've been running into a related issue today. I have a generated struct with a pretty common

LastUpdated time.Time

But the problem is that the official doc for xsd:dateTime describes the format as inspired by iso 8601 while go unmarshals time.Time to rfc3339. Since the two formats overlap but aren't the same, I ran into an issue where I was not lucky enough to have the time parsing work. In my case, the parsing failed with an unsurprising:

parsing time "2021-09-24T12:04:09.69" as "2006-01-02T15:04:05Z07:00": cannot parse "" as "Z07:00"

One solution would be to define a Time type alias and have that type define the xml marshaling / unmarshaling interfaces while supporting the various formats that satisfy the xsd:dateTime specification. This wouldn't work with the other datetime related types like gDay, however. Here's an attempt without the full list of formats that should be included. I also included a quicky SetPreferredFormat to the generated code to allow me to specify the format to use when marshaling but after getting this to work, I ended up with something that I'm not fully confident would work in 100% of the cases but worked well enough for me.

Here's what I have which could be generated statically when an xsd file have structs with Time fields:

var preferredTimeFormat = time.RFC3339

// SetPreferredXMLTimeFormat sets the preferred time format (see https://pkg.go.dev/time) for parsing time for this
// package's XML structs. This is not a great API but it's meant as a workaround for the inability to specify
// time format through the encoding/xml API. This function is meant to be invoked during initialization depending on an
// application's needs. Note that the preferred format is the first one tried when unmarshaling and the only one used
// when marshaling. If not set, the default is time.RFC3339.
func SetPreferredXMLTimeFormat(format string) {
    preferredTimeFormat = format
}

// Time is a type for parsing time using the xsd date time format
type Time time.Time

func (t *Time) UnmarshalXML(d *xml.Decoder, start xml.StartElement) (err error) {
    var value string
    err = d.DecodeElement(&value, &start)
    if err != nil {
        return err
    }

    return t.parseXMLTime(value)
}

func (t *Time) UnmarshalXMLAttr(attr xml.Attr) (err error) {
    return t.parseXMLTime(attr.Value)
}

func (t *Time) parseXMLTime(value string) (err error) {
    formats := []string{preferredTimeFormat, time.RFC3339Nano, time.RFC3339, "2006-01-02T15:04:05.00", "2006-01-02T15:04:05.999999999"}

    for _, f := range formats {
        parsed, perr := time.Parse(f, value)
        if perr == nil {
            tValue := Time(parsed)
            *t = tValue
            return nil
        } else {
            err = perr
        }
    }

    return err
}

func (t *Time) MarshalXML(e *xml.Encoder, start xml.StartElement) (err error) {
    tm := time.Time(*t)
    formatted := tm.Format(preferredTimeFormat)
    return e.EncodeElement(formatted, start)
}

func (t *Time) MarshalXMLAttr(name xml.Name) (attr xml.Attr, err error) {
    tm := time.Time(*t)
    formatted := tm.Format(preferredTimeFormat)
    return xml.Attr{Name: name, Value: formatted}, nil
}

// ExampleStruct is an example of a type with a datetime field. This would use the generated type alias Time
// which can then be converted to time.Time by the user
type ExampleStruct struct {
        LastUpdateAttr        Time    `xml:"LastUpdate,attr,omitempty"`
}

All that said, my feeling is that it would be preferable to map datetime, date and the other time-related standard types to string instead of time.Time unless we feel confident we can implement all those types in a way that doesn't fail during xml unmarshaling. It would be nice to have something like ☝️ that we'd feel confident complies with the spec but it would require more testing than what I've done with the code above.

So my question to you @BatmanAoD and @xuri would be: should I create a PR that replaces all mappings to the golang type time.Time with a string or should I have a PR that attempts to support datetime with the implementation ☝️ but still replaces the time.Time mappings for the other types to string?

BatmanAoD commented 2 years ago

I think the only way to properly unmarshal these types to anything other than string would be to define Go types implementing the Unmarshaler interface. That sounds like a proliferation of nonstandard type aliases that probably don't belong in a library like this, so I think string is preferable.

alexandre-normand commented 2 years ago

I think the only way to properly unmarshal these types to anything other than string would be to define Go types implementing the Unmarshaler interface. That sounds like a proliferation of nonstandard type aliases that probably don't belong in a library like this, so I think string is preferable.

That's correct. Implementing the Marshaler / Unmarshaler interfaces is what I had done in the code I included above and it's hard to have full confidence that this wouldn't result in annoying bugs (missing formats or a format resulting in valid parsing but that could lose precision in the parsed value). Having string fields seems like the better tradeoff even if this means users would have to do the time.Time parsing as a step following their XML parsing.

Thanks for the input, @BatmanAoD . I'm preparing that PR now to switch to string.

alexandre-normand commented 2 years ago

And here's the PR.