zolyfarkas / avro

Mirror of Apache Avro
Apache License 2.0
16 stars 11 forks source link

Read Long from String (datetime) #5

Closed leosilvadev closed 3 years ago

leosilvadev commented 3 years ago

Hello, I had one case in a project where we have some timestamp fields printed as string (we cannot fix that) so when we try to use the Json decoder it fails. Do you think this feature could be added to the readLong function? I mean, to try to parse to one (or more, configurable) datetime formats? I have done some work on this and could open a MR in case you think it is relevant.

Tks

zolyfarkas commented 3 years ago

Hi, can you provide a simplified example so that I can better understand the use case? Usually to deal with situations like this you would create a custom logical type like the types I have added in the lib:

"instant" => https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/logical_types/converters/InstantConverter.java

which is customizable with a "format" property.

or something like a a union of all temporal types:

"temporal" => https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/logical_types/converters/TemporalConverter.java

I hope these examples that arised from used cases I encountered help.

leosilvadev commented 3 years ago

Hi @zolyfarkas thanks for the reply! So, yes, I checked that solution but I am a bit confused, is this class in some other lib? I am using the version 1.9.0.20p (was using 1.9.0.12p), but these converters are not present in there

leosilvadev commented 3 years ago

From what I noticed these converters are available in the avro 1.9.2+ lib, so I wonder if there is a more up-to-date version from your side (I am pointing to https://dl.bintray.com/zolyfarkas/core/org/apache/avro/avro/)

leosilvadev commented 3 years ago

But to give the context, I am receiving some json that comes with datetime as string and I am transforming them to avro, but the timestamp field is a long with logical type timestamp-millis

zolyfarkas commented 3 years ago

it looks like the schema you are using is not right for your data. For your timestamp field you should use "instant" like:

https://github.com/zolyfarkas/core-schema/blob/master/src/main/avro/core.avdl#L121

which is a logical type available in my fork. The logical type is available in all versions. if the timestamp format is special (not the ISO default), you can use @format("yyy....") to specify it.

leosilvadev commented 3 years ago

So currently using the default avdl does not work?

We have this: timestamp_ms timestamp;

For what I mentioned about version of the lib, in the version I am using none of these options you mentioned are there

zolyfarkas commented 3 years ago

Looks like you are using the official avro library.

timestamp_ms is equivalent to:

 @logicalType("timestamp_ms") long timestamp;

I never liked the avdl shortcuts introduced in official avdl for certain logical types, they are just confusing, and obfuscate the real types. (This is why they are missing in my fork... for now...)

However, you can easily implement your own timestamp logical type on top of string to support your use case.

It is easy to do, you can use the "instant" logical type I shared with you as a model. logical types are there to extend the type system, and are fairly easy to implement and use.