Closed gaojieliu closed 1 year ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
2fb570c
) 45.77% compared to head (854aab3
) 45.78%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for adding back the generated classes @gaojieliu, but can we add them before the code change, so we can see the diff? i.e. these steps as separate commits:
On a separate note, @radai-rosenblatt, how strongly do you feel about not checking in the generated code? It makes it quite tedious, needing to do these git acrobatics in order to see the effect of the changes to the meta code...
Thanks for adding back the generated classes @gaojieliu, but can we add them before the code change, so we can see the diff? i.e. these steps as separate commits:
- Tweak the ignore file to add back the generated code (perhaps in just one Avro version, or all version... either way may be fine...).
- Do the code change.
- Changes to the generated code.
- (Possibly optional) undo the tweak to the ignore file to re-ignore generated code...
On a separate note, @radai-rosenblatt, how strongly do you feel about not checking in the generated code? It makes it quite tedious, needing to do these git acrobatics in order to see the effect of the changes to the meta code...
It will be great to checkin one version of generated classes, otherwise the PR effort will be too high..
Hi @gaojieliu
After upgrading to 0.3.21
we started observing errors:
java.lang.NullPointerException: Cannot invoke "org.apache.avro.Schema.equals(Object)" because "writer" is null
at org.apache.avro.Schema.applyAliases(Schema.java:1890)
at org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:131)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at com.linkedin.avro.fastserde.FastSerdeUtils$FastDeserializerWithAvroSpecificImpl.deserialize(FastSerdeUtils.java:68)
at com.linkedin.avro.fastserde.FastGenericDatumReader.read(FastGenericDatumReader.java:126)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:263)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:248)
Presence of the new class (FastSerdeUtils
) pointed us to this PR...
Has something changed?
We use
new FastSpecificDatumReader<>(null, schema);
but it looks like null
is not allowed anymore.
Thanks for reporting... that is weird, since I don't see changes to the FastSpecificDatumReader
constructors... but... what does it mean to have a null writer schema, though? Are you expecting the writer schema to default the reader schema in that case?
OK, it's a bit more complex scenario :)
Schema schema = SomeAvroClass.getClassSchema();
FastSpecificDatumReader<SomeAvroClass> datumReader = new FastSpecificDatumReader<>(null, schema);
Path path = Paths.get("/Users/kris/dev/data-file-with-schema-in-header.avro");
try (InputStream inputStream = Files.newInputStream(path);
DataFileStream<SomeAvroClass> reader = new DataFileStream<>(inputStream, datumReader)) {
for (SomeAvroClass obj : reader) {
System.out.println(obj);
}
}
DataFileStream
constructor calls void initialize(InputStream in, byte[] magic)
method which extracts writerSchema
from inputStream
metadata. At the end the helper method invokes
reader.setSchema(header.schema); // replaces writerSchema if it's null --> that's our case
where reader
is our datumReader
.
Now, in version 0.3.21
, new field was added to FastGenericDatumReader
:
private final FastDeserializer<T> coldDeserializer;
which ignores invocation of public void setSchema(Schema schema) {
Basically setSchema(Schema schema)
should be somehow cascaded to coldDeserializer
.
I did an experiment and injecting
((FastSerdeUtils.FastDeserializerWithAvroSpecificImpl) this.coldDeserializer).customizedDatumReader.setSchema(schema);
to com.linkedin.avro.fastserde.FastGenericDatumReader.setSchema()
fixed the issue.
Of course the real fix should should be done in a better way (i.e. no casting).
Basically
setSchema(Schema schema)
should be somehow cascaded tocoldDeserializer
.I did an experiment and injecting
((FastSerdeUtils.FastDeserializerWithAvroSpecificImpl) this.coldDeserializer).customizedDatumReader.setSchema(schema);
to
com.linkedin.avro.fastserde.FastGenericDatumReader.setSchema()
fixed the issue. Of course the real fix should should be done in a better way (i.e. no casting).
Ah, I see, that's very interesting... we should add a test for that (:
Hi @FelixGV @gaojieliu I provided a fix for that:
We also provided bugfix for another case:
Hi, It looks like the feature has a bug which should be fixed by:
We have the following requirements: For serialization, we would like to validate whether all the map fields are using the desired map type. For deserialization, we would like to deserialize the map type into a special map impelementation for later use.
These customized requirements are not supported in the past because of the following reasons:
This PR adds a new functionality to specify customized logic and it is expandable and backward compatible. DatumReaderCustomization : customization for read DatumWriterCustomization : customization for write
Currently, these classes only support the requirements mentioned at the beginning.
How it works internally?