add unknown field support

pavelstudeny commented 7 years ago

protobuf.js version: all, up to the current 6.8.0

proto2 preserve unknown fields for future serialization. This is also planned for proto3 in the near future: https://docs.google.com/document/d/1KMRX-G91Aa-Y2FkEaHeeviLRRNblgIahbsk4wA14gRk/edit#heading=h.w8dtggryroj4

I'm working on a merge request that will add unknown fields when decoding as

_unknownFields: { <id|wireType>: }

This should be doable by add read_type_bytes() to Reader and using it instead of skipType. The _unknownField can be later serialized again or discarded by a new method discardUnknownFields().

Tests:

var proto = "message Simple_v1 {\
  optional string name  = 1;\
  optional string value = 3;\
}\
message Simple_v2 {\
  optional string name  = 1;\
  optional int32  flags = 2;\
  optional string value = 3;\
}";

var msg = { inner: [{}, {}, {}] };

tape.test("unknown fields", function (test) {
    var root = protobuf.parse(proto).root,
        Simple_v1 = root.lookup("Simple_v1");
        Simple_v2 = root.lookup("Simple_v2");

    var s2 = Simple_v2.create({ name: "v2", flags: 2, value: "dummy" });
    var s1 = Simple_v1.decode(Simple_v2.encode(s2).finish());

    var restored = Simple_v2.decode(Simple_v1.encode(s1).finish());

    test.equal(s2.name, restored.name, "assert: even known fields are missing");

    test.equal(2, restored.flags, "are preserved by default");
    test.end();
});

tape.test("discarded unknown fields", function (test) {
    var root = protobuf.parse(proto).root,
        Simple_v1 = root.lookup("Simple_v1");
    Simple_v2 = root.lookup("Simple_v2");

    var s2 = Simple_v2.create({ name: "v2", flags: 2, value: "dummy" });
    var s1 = Simple_v1.decode(Simple_v2.encode(s2).finish());

    try {
        s1.discardUnknownFields();
    }
    catch (ex) {
        test.end("discardUnknownFields() exception: " + ex);
        return;
    }

    var restored = Simple_v2.decode(Simple_v1.encode(s1).finish());

    test.equal(undefined, restored.flags, "are removed from the message");
    test.end();
});

Does it sound OK?

dcodeIO commented 7 years ago

I have a feeling that this will come at quite a performance hit. For example, encoders serialize fields in ascending field id order, as of the spec. Making this work with unknown fields might be quite a challenge without hurting performance significantly.

We'd probably be better off ignoring order of unknown fields completely, simply appending uninterpreted buffer slices to the final buffer again. skipType could, for example, in addition to skipping over the data, just return the respective buffer slice. These slices would then be stored in an array without any id/wiretype association required, for the encoder to append.

pavelstudeny commented 7 years ago

Yes, implementations for other languages also place unknown fields at the end, so protobuf.js could follow the habit. This should also remove any performance impact, I hope.

May be there should anyway be a global option to ignore the unknown fields, in addition to an explicit discard?

Does it sound OK that the discardUnknownFields method would be on the decoded data instance (s1/s2 above) or should it rather stay at the type descriptor (Simple_v1/Simple_v2)?

dcodeIO commented 7 years ago

May be there should anyway be a global option to ignore the unknown fields, in addition to an explicit discard?

Yeah, why not.

Does it sound OK that the discardUnknownFields method would be on the decoded data instance (s1/s2 above) or should it rather stay at the type descriptor (Simple_v1/Simple_v2)?

Instances don't have any methods on them by default (except .toJSON, which must be on the instance). But this could also be just a global helper function, as it basically just does a delete someMessage._someProperty.

pavelstudeny commented 7 years ago

How about https://github.com/dcodeIO/protobuf.js/pull/779?

I hope the CODECLIMATE_REPO_TOKEN is a problem on the test machine.

I had to remove the first variant of writeBytes in writer.js. It looked like a mere optimization, that was however never called and was failing on nodes 4-. It was also ignored by test coverage - that's why the existing tests couldn't find the problem.

pavelstudeny commented 7 years ago

Just wondering about a general opinion, whether the solution looks acceptable as it is or needs any changes?

dcodeIO commented 7 years ago

Sorry, haven't yet got around to check. Being a bit lazy here because it's not in the official implementation yet, but I appreciate your efforts!

pavelstudeny commented 7 years ago

Fair enough for proto3, if the upcoming specification it's going to define API calls for this feature. This, however, also fulfills the requirement for proto2, that is also handled by protobuf.js.

niicojs commented 6 years ago

I would need this also. Is your PR usable already?

pavelstudeny commented 6 years ago

It's complete and usable, although I'm not using it myself, as dcodeIO may decide to modify it or come up with a different solution.

robin-anil commented 6 years ago

I would love to +1 this. Currently, the parse fails with unknown fields in the byte stream. this is a hallmark feature of protobufs in other languages that allows developers to push different applications without breaking.

chakmingli commented 6 years ago

It's been a while. Will this change be accepted?

pavelstudeny commented 6 years ago

Protobuf 3 preserves unknown fields by default since release 3.5.0 from 2017-11-13:

Unknown fields are now preserved in proto3 for most of the language implementations for proto3 by default.

This should be a reason to accept pull request #779 or point out what shoul be improved.

jcgertig commented 5 years ago

BUMP this is very needed

akshah123 commented 4 years ago

This would be very useful feature. Are there any plans to have this implemented or a decision to not implement it?

akshah123 commented 3 years ago

@dcodeIO thoughts on what the plan is regarding this feature request?

alexander-fenster commented 3 years ago

@akshah123 Just FYI, we at Google Cloud client libraries will likely need this implemented sooner rather than later, so we'll likely spend some time in December looking into this.

jcgertig commented 3 years ago

I added support in my fork. Though it may not be fully spec you can see the diff here https://github.com/protobufjs/protobuf.js/compare/master...jcgertig:master

On Mon, Nov 16, 2020 at 11:16 AM Alexander Fenster notifications@github.com wrote:

@akshah123 https://github.com/akshah123 Just FYI, we at Google Cloud client libraries will likely need this implemented sooner rather than later, so we'll likely spend some time in December looking into this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/protobufjs/protobuf.js/issues/775#issuecomment-728203000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADPFOYURHT24MET462V4HDSQFNATANCNFSM4DIZKV7A .

Asafb26 commented 2 years ago

Anything new here? Is this PR will ever get merged?

yjqiang commented 2 years ago

Anything new here? Is this PR will ever get merged?

bogdan-khitsenko commented 1 year ago

Anything new here? Is this PR will ever get merged?

protobufjs / protobuf.js

add unknown field support #775